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we study the general problem of reconstructing a function, defined 
ice, from a set of incomplete, noisy and/or ambiguous observations, 
is work is to demonstrate the generality and practical value of a 
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in the form of a Gibbsian probability distribution on the 
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>r low signal to noise ratios) and the computational efficiency. 

he Bayesian approach to the solution of several problems, some of 
and solved in these terms for the first time. Specifically, 
are: the reconstruction of piecewise constant functions from noisy 
nstrucuon of piecewise continuous surfaces from sparse and noisy 
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of these applications, we develop fast, deterministic algorithms 
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mental problem in the design and analysis of systems endowed with 
ities is the construction of internal representations of the physical 
the external world. The precise form of these representations is not well 
and is the subject of much current research in Artificial Intelligence 
)gy. It is clear, however, that these representations should integrate 
knowledge about the physical properties of the external world with 
from a number of different sensory modalities. Furthermore, in 
( ffectively action-oriented, the representations should provide compact 
)f the physical structures of interest at different levels of detail. 
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Chapter 1 
INTRODUCTION 



problem is not exclusive of biological perceptual systems; it arises 

rmation from a set of sensors has to be processed, stored and retrieved 

way. Thus, it is of fundamental importance, for example, in the 

coifiputer vision systems; in the reconstruction of subterranean geological 

m geophysical data and in the design of biomedical imaging systems. 

m^tivatfcn for this thesis is to increase our understanding of the principles 

rig t|e process of integrating prior generic constraints with the available 

for the construction of these representations. In particular, we will 

em of reconstructing, in a way that is consistent with the available 

the value of certain properties of the physical structure of interest over 

region of space. 
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To Mnfe these early perceptual processes in a more precise way, let us model 
the specific p -operties of the physical structure as functions / that map a (compact) 
region fl C Z n into k m . In the most interesting cases, / will be either a scalar 
(m = 1) or | vector field (m = 2) defined on a two-dimensional region. This is 
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example, of the problems of image restoration and segmentation, and 
recovery of: depth from stereo; lightness; shape from shading; and the 
of optical flow in computer vision, as well as many problems in the 
of Jeological structure from geophysical measurements. 

W <i will assume that the available data consists of several sets of qualitatively 
different melsurements {g\, ...,9 m) that in general are modeled as: 
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.q>/ d motes the derivative of the property /; n,- is a noise process, and Hi 

operator (for example, in vision problems, the different measurements may 

: stereo disparity; brightness; color, etc.). We will also assume that 

is collected with different sampling patterns {Si,.. ., S M }, that is, 

gi are defined only on the finite set 5 t - C fi. Since most physical 

onsist of events that occur at a variety of scales, and in general, 

ely different scales have little influence on one another, the numerical 

of the behavior of a property over a range of scales can be used 

produce a physically meaningful hierarchical decomposition of the 

structure into individual substructures ("objects") which can be subsequently 

symbolic forms that are more compact and easy to manipulate (see 

tnd 1982; it is not surprising that there is psychophysical evidence 

presence of a multiscale processing hierarchy in the human visual 

(Jampbell and Robson, 1977, and Marroquin, 1976). 

solutions we are looking for consist on a family {/ Q } of numerical 
>f the function / at different scales (indexed by a) at the sites of some 
(the finest scale representation should correspond to the best estimate 
value of / at the sites of L). To illustrate this idea, in figure 1-a we 
^ bin try pattern, and in figures 1-b through 1-e, its numerical representation 
y coarser scales. This family of descriptions was generated by the 
cribed in section 5 of chapter 4. 

e;enei il, the observation processes g,- do not determine the value of / in a 
dnd s able way (that is to say, these problems are ill-posed in the sense of 
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ird; :ee Poggio and Torre, 1984). Therefore, the algorithms we are looking 
Id \ e able to regularize the problem by incorporating constraints on the 
generated by some prior knowledge about its general characteristics. 

ally, because of the large number of variables involved, reasonable speed 
rmaice will usually require that these algorithms be distributed, and thus, 
in piementable in parallel hardware. 
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Representation of the binary pattern (a) at increasingly coarser scales. 
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the most successful solutions to these type of problems are those 
e them as variational problems, where the measurement and generic 
constraints ale separately represented in the following way: 



Let us consider the case of only one set of "perfect" measurements (i.e., with 
no noiso) g c jfined on the set 5, and suppose that the constraints that they impose 
on the iblutiln can be expressed in the form: 



J A(f i9 ) = 
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positive definite, real valued function that measures the incompatibility 
)f the property / with the observations g. In general, the observations 
erfect, and so, we will only require that the error f s A(f, g) be small. 
:re may be a large number of configurations that minimize the error, 
lique solution, an assumption about the global smoothness of / is 
means of some positive definite, real valued function P(f,Df,...) 
the "jaggedness" of /. If both A and P are convex, the desired 
?e the unique minimizer of the "energy" functional: 

U(f, g) = j s A(f, g) + \f Q P(f, Df, . . .) (1) 

3arameter. 



of 



has been applied with varying degrees of success to the 

enWs ofl surface interpolation (Grimson, 1981b, 1982; Terzoupulos, 1983, 

fomdutation of visual motion (Horn and Schunk, 1981; Hildreth, 1984a,b); 

of slape from shading information (Ikeuchi and Horn, 1981); computation 

(tfctive contours (Ullman, 1976; Brady et al. f 1980; Horn, 1981); lightness 

1J974) and edge detection (Torre and Poggio, 1983). 

0t reclnt paper, Poggio and Torre (1984) have shown how functional of 
of equation (1) can be derived in a rigorous and systematic way using 
regularijtatioi methods (Tikhonov, 1963; Tikhonov and Arsenin (1977); Wahba 
(1980); in thi: context / n P is called a stabilizing functional, and X, the regularization 
parameter) 

On:e th \ functional (1) is specified, its minimization can be carried out by 
standard vari itional methods (Courant and Hilbert, 1953). Since usually one is 
interested in he value of/ only at the discrete set of points L, the solution of the 
resulting Eul< r-Lagrange partial differential equations can be obtained as the fixed 
point of) a relocation (cooperative) algorithm of the form: 

/i* +1) = Fi{fW) i e L (2) 

ifl algorithm can be efficiently implemented in parallel hardware using a 
locally connected processors (one for each site i), or even by some 
k (see Poggio and Koch, 1984). 



11 



It 

embed 
obser 
definin 
Poggio 
in c 



s in 
:he 



\z :ions 



hap er 



a 

197( ) 
6. 



It 
ments 



> alsc 

ito i 



functio al 



Su >pos( 
constra nts o 



Th ;soh 



If 
the mimmiz 
(this adproacfi 
problen 
(Df) at 



Th : 



regulari :atioi 
formula ion 
algorith ns. I 



p-ior 



g 3bal 



whin 



doi lain 



cresting to note that it is also possible, and sometimes easier, to 
knowledge about the solution, and the constraints imposed by the 
directly in a cooperative network of a given form, without explicitly 
variational principle. This approach has been used by Marr and 
for the stereo matching problem. We will have more to say about it 



possible, in principle, to incorporate qualitatively different measure- 
single cooperative process, by a simple modification of the energy 

that we have M sets of measurements, and that each set g t - places some 
1 / (and/or its derivatives) which can be expressed by the functionals: 



f St M9i> /,#/,...) = i = 1, . . ., M 



tion will now be constructed as the global minimizer of the functional: 

M 



U(f) = £ ".•(/, 9) f Si A + X f Q P(/, Df, . . .) 



(3) 



where t le p* rameters a t - measure the relative weight we wish to assign to each set 
of meai aremfents. 

ill 



thje functions A, are convex, the solution will again be unique, and 

tion of (3) may be carried out by means of a cooperative network 

has been used by Terzopoulos (1985) for the surface interpolation 

the depth value / is known at some set S\ of sites, and the slope 

a different set S 2 ). 



approach we have been discussing — which we will call the "standard 

method" is very attractive: it provides a unified framework for the 

Df a variety of problems, and it leads to computationally efficient 

owever, it has some important limitations (some of them pointed out 

by Poggio an i Torre): 

(i) Vejy oft m the assumption that the solution / is smooth over the whole 
1 is not justified. What is more commonly true is that fi can be 
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clejir, arid they often have to be selected on a purely empirical basis. 

cases, the choice of the particular (often quadratic) form for the 

A and P is arbitrary, and is determined mainly by the tractability 

iqueness problem for the solution, and by the simplicity of the 

Tiinimization algorithm (in some cases, of course, there may be 

ioretical or experimental considerations that justify this choice). 
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ed into a small set of disjoint connected regions, and that while / 
h in the interior of each of them, it has discontinuities aJong the 
ies between regions (which in turn are piecewise smooth curves), 
itation is a serious one, because very often the discontinuities of 
the regularization methods tend to hide, are the most important 
the surface, in particular if one is trying to compute a symbolic 
tation for it. 
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nt approach is to model the function /, whose reconstruction solves 

problem, as a random field that has to be estimated from a set of 

pbssibly ambiguous measurements. Within this formulation, one can 

sian viewpoint (see Good, 1983), and assume that the best way of 

prior knowledge about the nature of the solutionis in the form of a 

}roba|bility distribution P f . This distribution, together with a probabilistic 

the noise that corrupts the observations, allows one to use Bayes 

t|<t> coripute the posterior distribution P^ g9 which represents the likelihod of 

;iven the observations g. In this way, we can solve the reconstruction 

nding the estimate / which either maximizes this likelihood (the so 

Nfaxinlum a Posteriori or MAP estimate), or minimizes the expected value 

to Pj\ g ) of an appropriate error function. This formulation has several 

er the "Standard Regularization" approach: 
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3. Precise In erpretation. 



we v 
istic 
will 



nple modifications in the prior probabilistic model for /, one can 
rithms that reconstruct not only smooth, but piecewise constant or 
ntinuous functions. It is also possible to include explicitly into the 
knowledge about the geometry of the curves that bound the smooth 
about the discontinuities) of/. 



r\J I gu • • .gM) = ^m n . x 



niii P(9i) 



Thb parlimeters that appear in the reconstruction algorithms that are derived 

this ap iroach have a precise statistical interpretation (for example, the relative 

if thl evidence provided by each set of observations, will be determined 

of the associated noise process); also, the plausibility of the 

tions about the behavior of the solution can be explicitly verified 

sample functions of the random field defined by P ft by means of 

appifppnate Monte Carlo procedure. Finally, one can choose the precise loss 

whdse expected value will be minimized by the Bayesian estimator. 



3. Computati )nal Efficiency. 



ill see, if the random field defined by P f is Markovian (i.e., if the 
iependencies are local), the estimation algorithms will be distributed, 
be possible to implement them efficiently in parallel hardware. 
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3. Goa 



The 



described tc 
we wil 



1. Present 
be used 



2. Devdlop impropriate loss functions, and the corresponding optimal estimators for 
different clai ses of problems. 



3. Develop i 

4. Apply th( 
and practica 



W* 



$ of 



ob ective of this work is to apply the probabilistic approach we have just 
the solution of a general class of perceptual problems. In particular, 



a 
verj 



nov 



his Thesis. 



class of random fields with local probabilistic dependencies, that can 
effectively to model the behavior of a wide variety of functions. 



eneral distributed algorithms for computing these estimates. 

above results to several specific problems, to illustrate the generality 
value of this approach. 



5. Dev^op i|iore efficient algorithms for each of these particular cases, 
present a list of our main contributions: 



3.1. Summar i of our Main Contributions. 

1. Optimal Ijayesian Estimators. 

Several ksearchers have used Bayes theory and Markov random field (MRF) 
models for he restoration of piecewise uniform images. It has been implicitly 
assumes by most of them that the maximization of the posterior probability 
(which leads to the Maximum a Posteriori or MAP estimator) is the best possible 
performance criterion. We introduce the use of different specific error criteria 
for the ilesig i of the optimal Bayesian estimators for several classes of problems, 
and propose a general procedure (which is based on some existing Monte Carlo 
techniques, s jch as the Metropolis algorithm) for approximating them. We show, 
both th^Oreti rally and experimentally (in particular for the case of the restoration of 
piecewifle uni form images) that this new approach leads to a substantial improvement 
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over 
low 



the exi< 



signal 

2. Novd Ap 

Througlout this thesis we present several examples of the application of the 
probabilistic approach, and of the optimal estimation procedures that we have 
derived, to : everal problems, some of which are formulated and solved in these 
terms fbr th< first time. The results that we get show that this approach can provide 
a unified frai lework for the integration of a variety of related perceptual tasks into a 
single qpope ative process. Also, these results represent, in several cases, a significant 
lmprovftmen : over those obtained using existing schemes. Specifically, these new 
applications ire the following: 



a) 
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one 



c) 



Surfr :e Interpolation. 



ill 
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adapt 

intE:ractibns 
appropr 
majce it 
lustr 



5«igna 



ISO 



detail a 

of 

per 



syntheth 



ing methods, both regarding the quality of the results (particularly for 
noise ratios) and the computational efficiency. 

lications. 



em of reconstructing a piecewise continuous surface from sparse and 
is formulated using a Bayesian approach, using two coupled MRF's 

the behavior of the smooth patches, and of the curves (discontinuities) 
nd them. Although this type of coupled model has been used before 
context of the restoration of piecewise uniform, noisy images), its 
)n to this problem requires some non-trivial modifications: the local 
between the elements of the fields have to be redefined in an 
ate way, and the general estimation algorithm has to be modified to 
computationally feasible. The practical value of the resulting algorithm 
ited using both synthetic and real data. 



Matching. 



Thjp prqblem consists in finding the corresponding points in two signals when 

tained from the other by shifting it by a variable amount. We study in 

►pecific instance: the reconstruction of depth from a stereoscopic pair 

lpiagjs, and show how to formulate it using our general framework. The 

ormince of the algorithms that we construct is also illustrated by means of 

and real examples. 



brm ition of Perceptual Clusters. 
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We sug >cst that the process of formation of perceptual clusters of certain dot 
pattern! can be modeled in terms of the estimation of binary images corrupted by 
ijltipl cative noise, and illustrate the application of our estimation algorithms 



m 
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3. Efficient i 
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possibly 

signi 

develop 

present. 

a) 

W4 

corriputfs 
a rigoro 
scheme 
which a 
estimatk 



We 



this t isk 



magmtuc e 
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to 



refined 



J^stim 
heu 



appimir 



find 



computa 



Jie 



the 



efficiency 



lgorithms. 

Alflhou^h the Monte Carlo procedure that we have developed for approximating 

stimates is perfectly general, for each particular application it is often 

to Jesign alternative (some times deterministic) algorithms that improve 

ficftntly |the computational efficiency. It has been our concern in this work to 

alternative fast algorithms for each one of the applications that we 

fically, we have developed the following algorithms: 

Estimbtion of One-Dimensional Signals. 

prefent a new deterministic algorithm of minimal complexity which 
(exactly) the MAP estimate of binary, one-dimensional MRF's, and 
is proof of its optimal performance. We also develop an alternative 
or the same purpose, based on dynamic programming principles, 
n be extended to handle more general situations (such as the MAP 
n of piecewise constant one-dimensional signals). 

ttion of Two-Dimensional, Binary MRF's. 

istically motivate and develop a new deterministic algorithm for 

ating the optimal Bayesian estimator of two-dimensional MRF's. We 

experimentally, that the quality of the results produced by this scheme is 

eqi^valei it to those obtained by the general Monte Carlo procedure, and the 

ional efficiency (execution time) is improved at least by an order of 



cjse of the MAP estimation of binary patterns, we develop a modification 

imulated Annealing" procedure, which improves its computational 

It is based on the computation of "coarse solutions" (formed by 

aggftgati|ig the elements of the field into blocks) which are then progressively 
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c) 



In 



thisdase, we also develop a heuristic, deterministic scheme whose experimental 
pe:rforn ance is practically equivalent to that of the Monte Carlo procedure, 
and imlroves significantly on its computational efficiency. 



d) 



We pn 

some si 



local co 



iteratior s 



the 



is 



aiscu: 



In 
case wlrtre tie 
known, ind f 
has nevftr beln 
on an 



5. Paral 

An 
develop 



Recc 



Stere o Matching. 



recc tistruction 



sjialys s 
estimation of 
to the <tt)nstiliction 
for the ttecotttruction 
demonstrate 



is th 



nstruction of Piecewise Continuous Surfaces. 



pose a new algorithm for solving the stereo matching problem in 

nple cases. This scheme is based on the direct implementation of the 

istraints (generated by the probabilistic model) in a highly distributed 

coopera :ive network of a particular form: a "Winner-Take-All" network. We 

shjjw rijorously that, for noise-free observations, the state of this network will 

coflvergp to the correct solution, and estimate the maximum number of required 

(which is usually very small). The application of this technique to 

of the depth of real objects from stereoscopic photographs 

sed, and some modifications to the algorithm are introduced, which 

pertnit 1 is to produce results whose quality is comparable to those of other 

smte of the art" algorithms. 

4. Parameter! Estimation. 



le c( ntext 



of the estimation of two-dimensional, binary fields, we study the 

parameters that characterize the field model and the noise are not 

ave to be estimated from the noisy observations, a situation that, so far, 

treated. We present a maximum likelihood procedure, which based 

of the residual ("innovations") process, permits the simultaneous 

the field and the parameters of the system. We apply this technique 

of an algorithm, which does not have any free parameters, 

of piecewise uniform images, and perform experiments to 

s performance. 



el Im Cementations. 



impcjrtant issue regarding the practical value of the algorithms that we 
ir possible implementation in certain non-conventional hardware, 
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such ai mas lively parallel digital machines; hybrid and analog computers, etc. In 
this connect on, we make the following contributions: 



a) 



ana yze 



W: 

ap >roxi 



the parallel implementation of the general Monte Carlo procedure for 

nating the optimal Bayesian estimators. We show that the convergence 

widely used algorithms (such as the Metropolis and Heat Bath 

) cannot be guaranteed in this case. We justify the selection of an 

apfl>ropi ate algorithm (the "Gibbs Sampler"), and present an estimate of its 

putjtional complexity. 



of certdin 
scheme; 
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b) 
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Th; 



We 



Thi: 



Mon 
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the r 



the !e 



also 



e Carlo Procedures. 



istruction of Piecewise Continuous Surfaces. 



parallel implementation of both the modified Monte Carlo procedure 

deterministic algorithm that solve this problem are analyzed, and 

computational complexity is estimated. We also propose schemes for the 

construe ion of hybrid (digital/analog) and analog networks that implement 

pi Dcedures, and perform digital simulations to evaluate experimentally 



the r pe: formance. 

c) : !stim ition of Two-Dimensional Binary Fields. 

Thi cor iputational complexity of the parallel implementation of the fast 
det jrmii istic algorithm that performs this task, is estimated and compared with 
thaW of tie general Monte Carlo scheme. 



propose the adaptation of a class of analog networks proposed by 
Ho >fielc and Tank (1985), so that we can obtain an approximation to the 
opt mal estimate of the field from the equilibrium state of this system. The 
per brmz nee of this scheme is assessed experimentally by means of numerical 
sim tlatic ns. 

3.2. The is O erview. 



thes s is organized in the following way: 
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paralle 



Id chapter two we will introduce the basic concept of a Markov random field; 

compute the corresponding probability distribution, and present Monte 

fj>roce|dures for generating sample functions. In chapter three, we develop 

for the image segmentation and surface reconstruction problems, 

corresponding optimal Bayesian estimators. We also present general 

algorithms Tor computing these estimates, and discuss their implementation in 



show Ht>w tc 

Carlo 

loss fuitictiolals 

and derive the 



These 



esults are applied, in chapter four, to the problem of segmenting 

piecewtee constant images given noisy observations. For the particular case of 

mag ;s, a very efficient distributed algorithm is developed, and we present 

for the case when the model and the noise parameters are not known, 

be estimated from the noisy data. Also in this chapter, we show how 

pftincidles can be applied to the problem of computing the perceptual clusters 

fornled in some dot patterns. 
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Chapter 
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ementin 



papier five, we treat the problem of reconstructing piecewise smooth surfaces 
and noisy data, without blurring the boundaries between continuous 
discuss the use of Markov random field models to embody the prior 
bout the shape and location of the discontinuities, and show how 
general reconstruction algorithms developed in chapter three to this 
also develop a special purpose efficient algorithm for this case, and 
rallel implementation. 



six is devoted to the problem of the reconstruction of depth from 

stereoscopic images. As in the previous cases, we first present a probabilistic 

)f the problem, and extend the general methods of chapter three for 

a solution. Then, we develop special purpose algorithms that improve 

computational efficiency. The performance of these algorithms is illustrated 

bflth synthetic and "real" images. 



Finally, In chapter seven, we summarize our results, and suggest areas where 
future resean h may be fruitful. 
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1. Intrc Jucti >n. 



Tr||j key|to the success in the use of the probabilistic (and in particular, Bayesian) 
the solution of the class of reconstruction problems in which we are 
our ability to find a class of stochastic models (that is, random fields) 
following characteristics: 

Th|^ probabilistic dependencies between the elements of the field should 

lly localized. This condition is necessary if the field is to be used 

surfaces that are only piecewise smooth; besides, if it is satisfied, 

nstruction algorithms will be distributed, and thus, efficiently 

im|j>lemjntable in parallel hardware. 
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class of 
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00 
(iii) Thft relation 
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tme cc rr 
moc sis 



(iv) It tjhoulli be possible to represent the prior probability distribution P f 
exj^icitlj, so that Bayes theory can be applied. 

(v) It ihoulA be possible to specify an efficient Monte Carlo procedure for 
generatii ig sample fields from the distribution, so that the ability of the 
mqael td represent our prior knowledge can be verified. 



Mar 
er, 



we 



Chapter 2 



LOCAL SPATIAL INTERACTION MODELS 



Th|4 clajs should be rich enough, so that a wide variety of qualitatively 
behaviors of the desired solutions can be modeled. 



between the parameters of the models and the characteristics 
esponding sample fields should be relatively transparent, so that 
are easy to specify. 



Foftuna|ely, there is a class of models that satisfies these characteristics: the 

ovian Random Fields (MRF) on lattices. We will describe them in 

and we will also show how they satisfy the required conditions. To 

' nil need two important results: the Hammersley-Clifford theorem, 

relal *d to conditions (iii) and (iv), and the Metropolis and Gibbs-sampler 

algorithms, wfiich will permit us to satisfy condition (v). 
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2. Markov I andom Fields. 

Trie cor cept of a MRF is a direct extension of the concept of a Markov process 
to higher dii lensions and originated in the work of Ising (1925) on the construction 
of models f )r ferromagnetic phenomena. The definition for a two dimensional 
continilibus J 4RF was introduced by Wong (1968), following Levy (1956) (see also 
Dobrushin, .968), and in intuitive terms it says that a random field is Markovian 
ajijiy closed curve that separates the space into two regions, the knowledge of 
the field along the curve, makes the field in these regions mutually 



^ ge 
fielc 



G r i 



for 



for our purposes (since usually we will be interested only in 

the field at the sites of a regular lattice) is the definition of a discrete 

eralization of the concept of a Markov chain. A discrete Markov 

on a finite lattice is defined as a collection of random variables, which 

the sites of the lattice, whose probability distribution is such that 

al probability of a given variable having a particular value, given the 

rest of the variables, is identical to the conditional probability given 

the field in a small set of sites, which we will call the neighborhood 

ite. In formal terms we have the following (see Geman and Geman, 

Woods, 1972 for an alternative definition): 



5 tie a finite set of N sites, and G = {G s , s e S} be a neighborhood 
for S i.e., a collection of subsets of S for which: 
G a ffcr all s e S. 



and only if r e G 9 , for all r, s G S. 



{F a , s e S} be any family of random variables indexed by s e 5, and 
supposed for simplicity, that these variables take values on some finite sets {Q a } 
(the deiinitici can be extended, with some technical modifications, to the case of 
continuous sfcte space). We will call any possible sample realization / : 

[jSi t ' • •) J8N ) i J Si £ Qai 
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ites 1, 2, 3 and 4 are the neighborhood of site j 



ifigftratich of the field Let n be the set of all possible configurations (i.e., the 
:] , and let P be a probability measure in fi. F is a MRF with respect 



= ) I > 0, for all / e n ( (F = /) denotes the event: (F s = /, for 
6 5]). 

= j,\F r = f r rf ±s) = P(F s = f 9 \F r = f r reG,). 

a el 



from this definition , that if the size of the neighborhoods is small, 
>ill satisfy the first condition we required from our class of models. The 
tion of a MRF from this definition (i.e., in terms of the conditional 
probabilities), lowever, is not very convenient because of the following reasons: 



functions defining valid conditional distributions for a MRF cannot 
arbitrarily, since they have to satisfy a set of consistency conditions (that 
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result from flayes' rule; see Besag, 1972), and are, in general, very difficult to specify 
directW) Seondly, although the joint probability distribution Pj can be uniquely 
determined rom the conditional probabilities, its computation is, in general, a 
highly non- rivial task. Finally, there is no obvious intuitive relation between the 
form of the < onditional probability distributions and the qualitative behavior of the 
sample field; 

To over :ome these difficulties, we need an alternative way of defining a MRF. 
This is donelas follows. 

2.1. MaikovjGibbs Equivalence. 



First, wi need the following definition: 



where . 
U(f) is 



is al 
ofth 



system of neighborhoods on a lattice, we define a "clique" C as either 
)r a set of sites of the lattice, such that all the sites that belong to C are 



form the neighborhood of site j, and the cliques are sets consisting 
le sites, or of two (vertically or horizontally) adjacent sites (nearest 



Givfen a 
a single site, 

neighbours df each other. For example, on a 4-connected lattice (Fig. 2), the sites 
1, 2, 3 ^nd * 
either d>f sin 
neighbours; lee Fig. 3). 



The: resi It we are looking for is contained in the Hammersley-Clifford theorem 
(Hammerslej and Clifford, 1971) which states that if F is a MRF on a lattice 
S with respe :t to the neighborhood system G, the probability distribution of the 
configu ratior s (sample functions) generated by it will always have a definite form, 
which is that of a Gibbs distribution: 



Pf(f) = ^e-W> 



normalizing constant, j3 is a parameter, and the "Energy function" 
form: 

c 
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Figure 3 



present 



for all 



where IJV// 
pairs om site: 
some flinctidns, 



where 

and thft potentials 

a 4-coiNnectdld 



Wi:hou loss 



) be! 
Siike F 



O 
i 



a--o 
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(b) 



(c) 



Cliques for the 4-connccted lattice of Fig. 2. 



C ranges over the cliques associated with the given neighborhood system, 
V c (f) are functions supported on them. Thus, in our example of 
lattice, U would be of the form: 






tnd 



N v denote the sets of all horizontal and vertical nearest neighbor 
of the lattice (figure 3 (b) and (c)), respectively, and V a , V h and V e are 



simp e proof of this important result can be found in Besag (1972). We 
here! a brief sketch: 



of generality, we may assume that (the configuration with /,- = 
Ings to Q (otherwise, we simply perform a translation of the origin). 



is a M RF, we have that 



P(0) > 
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P(0) 



step is to note that we can always write: 



P(0) 



= e 



QU) = E tiP Ah) + E E hhPAh, /,-) + 



* J 



+/l- • •fnGij... n (fu • • -fn) 



• any configuration / and any selected site i, we define the configuration 
equal to / everywhere, except possibly at site i, where it is equal to 0: 

/ w = {/i,.../.-i,0, /,-+!,...,/„} 



Using B lyes rule we find that: 



Note tta because 

probabi 

of site i 



No^v, su 
equj 



that 



P(f) _ P(fi\fj,J^i)-P(fjJ^i) 



P(fi\f,J^i) 



= exp[Q(/) - Q(/W)] == 



P(0\f 3 ;j^i) 

= explfidifi) + £ fifjGi,ifi, /y) + . . .] 
i 

of the Markov property, the above quotient of conditional 
:an depend only on the value of / at those sites which are neighbors 



>pose I is not a neighbor of z, and consider a particular configuration / 
1 to everywhere, except at sites % and /. By the above considerations, 



QU) ~ Q(f {{) ) = fiPiUi) + MiGM, ft) 
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depends onl / on /,-, which means that £?,•/(/,-, //) = 0. 



By 



sin 



from 
the 



a 
only 



ilar reasoning, one can show that G itji ... im (f it . . ., f rn ) can be different 
if the sites i, ;, . . ., m are neighbors of each other, i.e., if they belong to 
same clique. The proof is completed by defining: 

~~a W*i • • •» frn) === fit • • 'fm(*i t ...m\fi> • • •fm) 

It s im Dortant to note that whereas the functions defining valid conditional 
probabilities for a MRF cannot be chosen arbitrarily, the form of the potentials 
Vc is not restricted in any way, and can be used freely to specify the required 
the field / (which is what one does in practice). The relation between 
als and the conditional probabilities is given by the following formula 
folio vs from Bayes rule): 

FlFi = fi\F,- 



behaviour ol 
these patent 
(which 



Woods (197: : 
finite lattice 



where 



where $ t - is fie 
which k equbl 



exp[-^c:iecVc(f)} 
'"' * %) Z qeQi exp[-i J2 C:ieC Vb(/f)] 



(1) 



set of allowable values for the state of F i% and f q is the configuration 
to q at site i, and coincides with / everywhere else. 



There aire other ways of representing certain classes of MRFs. For example, 
) has shown that every homogeneous Gaussian MRF defined on a 
;atisfies a difference equation of the form: 

fnm = ^2 ^jfe//n-Jb,m-/ + U nm 
D(P) 

where f nm i the value of the field at site nm and u is a (non-white) stationary 
Gaussian fiel i whose autocorrelation function satisfies: 



E [u nm uqo] = 



c, m = n = 

-hmnc, {m,n)eD(P) 
.0, elsewhere 



D[P) = {(k,l) : 0<fc 2 + / 2 <P 2 } 
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esentation (called a "Conditional Markov" (CM) model by Kashyap 

t nen be used to generate sample functions (Woods, 1972 also presents 

based on the discrete Fourier transform, for the generation of sample 

realizations djf the field u, and for the computation of the joint distribution for /). 

< atisfies a difference equation of the form: 
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3. Generatior 



3.1. The 



E[fnmUkl] 



lo, 



if n = k and m = I 

otherwise 

h k i can be interpreted as the coefficients of the linear minimum mean 
estimator of f mn given its neighbors out to distance P, and u as the 
or. 



that 



} 



re :»rese 



easy 



per 'ectly general: it applies to discrete valued fields, and it can be 
ge leralized to the case of continuous valued ones. 

to generate sample functions from the distribution (we will discuss 
alg^rithips for doing this in the next section). 

posterior distribution is also a Gibbs measure, the optimal 
can be obtained directly from the posterior energy function. 



are independent random variables, is called a "Simultaneous 
e" (SAR) model by Kashyap ( a similar representation can be obtained 

with exponential autocorrelation functions; see Habibi, 1972). Although 
that for any homogeneous SAR model it is possible to find a MRF 

san|e spectral density, albeit with a different neighborhood structure, it 

very difficult to compute the joint distribution explicitly from the 

ttation. On the other hand, the Gibbs representation has the following 



fnm = X) hklfn-k,m-l + W nm 
D(P) 



estimate 

For thesf reasons, this is the representation that we will adopt 



of Sample Configurations of MRFs. 



Met opolis and Gibbs-Sampler Algorithms. 
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Tlie ealiest successful Montecarlo procedure for the generation of sample 
functions o MRF's was developed by Metropolis et al. (1953) for the numerical 
computation of thermodynamic properties of many-particle systems in thermal 
equilibrium To describe it, let us consider a system with TV particles, each of which 
may be in a ly one of a finite number of allowable states. Let /,- denote the state of 
the j th parti :le (we will refer to the TV -vector / as the global configuration of the 
system^, anc let U(f) be the corresponding energy. 

Tike ba ;ic idea of the algorithm is to construct a Markov chain whose states 

correspond :o the global configurations of the system at discrete time intervals 

. It i > a well known fact, from statistical physics, that when the physical 

is at hernial equilibrium at a given temperature T, its configurations will be 

distributed Recording to the Gibbs measure: 



V, 



I,..., 

system 



There fpre, 
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regular 
steps), 



direction, w< 



/ anc 



where 

we consider 



v e 



*(/) 



want 7r(/) to be the invariant measure for our chain. If the chain is 
if it is possible to go between any two states in some fixed number of 
vill be the unique vector satisfying: 



Slice 



*(/) = ^ «Pl 



(2) 



where Jfc is the transition matrix of the chain (see Kindermann and Snell, 1980). 
Alfelo, si 



a system in equilibrium looks the same if we reverse the time 
require that the associated chain be reversible, that is, 

'r(/(n + 1) - j | /(n) = i) = Pr(/(n - 1) = j \ f(n) = i) 



For a rsigula chain, reversibility is equivalent to the "detailed balance" condition: 



*(f)Pc(f,n = *{r)Pc(r,f) 



(3) 



/' are any two global configurations. This condition means that, if 
a large collection of isolated, identical systems, each one in thermal 
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equilibrium at the same temperature (the so called "Canonical Ensemble"), the 
of s|stems going from state / to /' must equal the number of systems going 
to . . This condition is also sufficient for the convergence of the chain to 
desired • jibbs measure. 



number 
from / 
the 



The algorithm proposed by Metropolis generates a regular chain that satisfies 
ii as I allows: 



i ppos 
ran doth 
wit! 



(3). It 

S 
some 
randorji 

as follows 
(i) Choose 

LI 

(ii) 
ofl 

(iii) If 
If 



If 
system 
?y for 



where 



a new state f } - randomly from the set of allowable states using a 
riiforrrlly distributed random number. 

Cq>ppu|e the increment in energy AEj that results from moving the state 
■ h particle from /y to /y. 

< 0, make the move, i.e., set /y = /y. 

> 0, generate a new random number r, uniformly distributed 
bejtlweei and 1. 

Ifli <«-**'/*, set /,• = £•. 

If V[ > C" AE '/ r , leave /y unchanged. 



u 
the; 

AEj 
AEj 



that we visit the particles of the system (i.e., the sites of the lattice) in 
sequential order (for example, we choose the next site to be visited at 
uniform distribution). When a particle j is visited, we update its state 



we c enote by q(f, f) the probability of proposing the state / when the 
is at tate / (i.e., the probability of visiting particle ;', and selecting the state 
it; nc te that q must be a symmetric, irreducible stochastic matrix, so that 
f= g(j , /). by construction), we have that 

Pc(/,/) = 9(/,/)min(l,e-^/ r ) 

^(/,/) = 9(/,/)min(l,e A ^ r ) 

AU = U(f) - U(f) 
Thereft^re, \$AU < 0, 

Pc(f, f) = q{f, f) and P c (f, f) = q(f, f)e AU/T 
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and if i\U > 0, 



Clearly! in bbth cases, (3) is satisfied. 

Thlis is Jot the only chain that satisfies (3). Another possibility is to set: 



which cas i we get the "heat bath" algorithm (see Gidas (1984) and Hastings 



in 
(1982)) 



A 

Geman 
iteration 



only 

the resisting 



systems 



To 
recall 
a 

is in 
the 
energy 



and 
onl] 



this 



Pc(f, f) = <?(/, f)e-* v ' T and Pcil f) = <?(/, f) 



Pc(f,f) = q(f,f) 
= 9(/, /) 



1 



1 + e^l T 



cjiffertnt construction, called the "Gibbs sampler" has been proposed by 

Geman (1983) (see also Besag (1972)). In this scheme, too, at each 

one site is modified; its new state, f 3 - is selected at random from the 

conditional distribution given by equation (1). These authors show that provided 

that we ceep visiting every site, (i.e., that we update its state "infinitely often") 

chain is ergodic, and its invariant measure is given by (2) (note that 

reversityjlity s not required in this case). It is not difficult to see that for binary 

i nethod is equivalent to the heat bath algorithm. 



3.2. Statistic: 1 Mechanics Interpretation. 



get in intuitive grasp on the way these algorithms work, it is useful to 

some r suits from statistical mechanics (see, for example, Reif, 1965). When 

macroscopi : system (i.e., a system with a large number of degrees of freedom) 

thermal equilibrium at a given temperature T y its state / will be such that 

Gibbs fre e energy F is minimized. The relation between F(f) and the internal 

j|(/) >f the system is given by: 

*■(/) = U{f) - TS 



31 



where he ei tropy S is: 



and 
equal 



fl([/) 



to 



From 
adopt 



will 
S), wh 
states. 



The 



Z is 



(or 
is very 



To 
lattice, Whos( 



with 



is 
U. 



a 
e at 



ti is relation it is clear that at high temperatures, a system in equilibrium 
disordered, high energy configuration (which will have a high value of 
ow temperatures, the dominant tendency will be towards low energy 

probability distribution of the equilibrium energy is given by: 

Pu(U) = je-WtW) 



where 

negativ^ exponential 

value I 

of freeclbm 

be inversely 



:lose 



illus rate 



S = In Q(U) 



the total number of feasible configurations of the system with energy 



constant. Since fi(-) is a rapidly increasing function of U, and the 

is rapidly decreasing, P\j will be sharply peaked around a 

Using the fact that Q(U) = 0(E/ n ), where n is the number of degrees 

df the system, one can show that the relative width AU of this peak will 

Droportional to the square root of n: 



AU 
U* 



1 

y/ri 



(This result holds, in fact, not only for the energy, but for other related 
thermo ilynai lical properties as well). This means that, for large n, the Metropolis 
Gibps saippler) chain will generate (asymptotically) configurations whose energy 
o U*(T), which is an increasing function of T. 



this, let us consider a binary system on a four-connected square 
energy function is given by: 

u{f) = fEvcVi,fi) 



VcU, 



;,/,. = {-"• 



if A = /y 
otherwise 
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Figure 4, Sample patterns of the two-dimensional Ising model at 0.8 (left), 1.0 (center) and 
1.2 (right) tim is the critical temperature. 





where 



dimensional 



$ radges over all the nearest neighbor cliques of the lattice (this is the two 
Ising model with "free boundaries" — since the only interactions that 
contribute t< > the energy are those between elements of the field that belong to the 
lattice W- wl ich we will later discuss in detail). 



In 
different 
The 
for thii 



figuie 



4 we present typical equilibrium configurations generated at three 

tenlperatures using the Metropolis algorithm with random updating order. 

terpen tures used correspond to 0.8, 1.0 and 1.2 times the critical temperature 

moiel (the critical temperature is defined as the maximum value of the 

for which the effect of fixed conditions at the boundary of a square 

it the center, no matter how large the lattice is. For the two-dimensional 

it equals 2.273). 



temperature 
is felt 



lattice 
Ising nfodel 



In 



the 



proportional 



regions, 



mit of very large lattices, the equilibrium energy per spin (which is 
to the total length of the boundaries between "black" and "white" 
is g|ven by (see Wannier, 1959): 

^P- = -|cothi[l ± £(1 - a*)'/'tf(a)] 



n 
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Figure 5. 
net (from 



Equilibrium values of the energy (a) and average density (b) for an infinite Ising 
Wanhicr, 1959) 



take 



where we 
constant; a 1 



and K(-) is 
Hildebrand, 



From a 
only free 
cluster densit 



Other 
of them) 
where they ar 



(a) 



T/TV 



(b) 



the + or - sign, above and below T e , respectively. A; is the Boltzmann 
given by: 

— 2smh (!/ r ) 

~" cosh 2 (l/T) 

the complete elliptic integral of the first kind (see, for example, 
976). 



The avejage density of "black" elements can be computed by the expression: 
The shaj e of these functions is illustrated in figure 5. 



ualitative viewpoint, one can see that the temperature, which is the 
parameter of this model, controls the granularity (average cluster size and 
) of the sample patterns. 



examples of patterns generated with these algorithms (or some variations 

may|be found in Cross and Jain (1983) and Hassner and Sklansky (1980), 

used as models for texture; in Geman and Geman (1983) as models 
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for piecewis 
produce mole 



constant images, and in G reminder (1983), where they are used to 
complex patterns. 



3.3. Continuous Valued State. 

Any of the two algorithms presented in section 3.1 can be generalized to the 
case where he state of each particle can take any real value on a compact set 
(e.g., a closd I interval) at the expense of their computational efficiency. A different 
approach th; t seems promising is based on the fact that a vector / which obeys the 
stochastic differential equation: 



where w is a 
Brown ian mfotion 
U, distributed 
(see Grenanier 
numerical 
patterns. Thi 
in a numericfel 
identically 



vector Wiener process with unit variance (a collection of independent 

processes), will be, under suitable smoothness conditions on 

asymptotically (as t t oo) according with the Gibbs measure (1) 

1984; Geman and Hwang, 1984). This means that we can use a 

simulation of (4) (see Wong and Zakai (1965)) to generate the desired 

approach has two interesting advantages, that result from the fact that, 

simulation, the increments dw are approximated by independent, 

diltributed Gaussian random variables: 



(i) We only 
algorithr is 

(ii) All sites 



The pro 
any given 
of partial 
Karlin and T 
the rate of 
way. 



df = -gradU(f)dt + \/2Tdw 



(4) 



need to generate Gaussian random numbers, for which efficient 
exist 



can be updated at the same time, so that efficient parallel 
implemelitations can be adopted. 



ability distribution of the configurations generated by the system at 

tiihe can, in principle, be obtained by solving an appropriate system 

difl erential equations (i.e., the Kolmogorov equations; see for example, 

tylor, 1981); this will not be practical in most cases, however, so that 
coj|ivergence of this algorithm will have to be assessed in an experimental 
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We will now describe how an extension of the techniques presented in this 
section can >e used to find the global minimum of arbitrary energy functionals. 
As we will s low in the next chapter, this method will be particularly useful for 
minimizatior in the variational principles which represent the Maximum a Posteriori 
estimated so! ution to a reconstruction problem. 



4. Simulated 



ilat( d 



Simu 
for the sol 
that any cos] 
finite set, can 
corresponds 
the Metropolis 
becomes a p 
Gibbs 
impulses) 
system in 
/ that minim 

seriAus 



measi re 



tine 



One 
a very long 
high temperajure 
slowly cool 



tie 



4.1. Discrete 

Geman 
at the rate: 



where n is th< \ 
Gibbs sample 



Annealing and Global Minimization. 



annealing is a new technique, developed by Kirkpatrick et al (1983) 

utlon of combinatorial optimization problems. It is based on the idea 

functional of N variables, each of which can take values on some 

be considered as the energy function of a physical system whose state 

:o a particular value of these variables. Therefore, we can use, say, 

algorithm to generate, at any given "temperature" T (which now 

rameter of the optimization process) samples from the corresponding 

. Since as T | this measure converges to an impulse (or set of 

coifresponding to the state (or states) of minimum energy, the state of the 

thermal equilibrium at zero temperature will correspond to the value of 

zes U(f) globally. 

difficulty, however, is that attaining thermal equilibrium might take 
at low temperatures. Kirkpatrick's idea was to start at a relatively 
(where thermal equilibrium is reached very fast), and then, to 
system, until "freezing" occurs and the state stops changing. 



Valued State. 



<! : Geman (1983) were able to show that if the temperature is lowered 



log(n + 1) 



(5) 



number of iterations, and C is a constant, this algorithm (using the 
) will in fact converge (in probability) to the set of states of minimal 
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energy. The} 
for any real 



where fi is Ihe set of allowable global states. This means that we can use time 
averages to e timate ensemble averages. Similar results have been obtained by Gidas 
(1984) for th^ Metropolis and heat bath algorithms. 



The mil 
can be 
and Geman 



imal value of the constant C in equation (5) for which convergence 
guaranteed has not been determined in general. The value found by Geman 



where N is 
difference in 
only one site 
applications 
C can be 



where A' is 
(in the sense 
that this 



rece it 



In a 
in terms of 
states of the s 
is approxi 
In some 
of theprobl 
of the system 



For the < 
and error procedure 



also showed that this chain is asymptotically ergodic in the sense that 
valued function Y of the global state at time t, f(t), we have: 



n\oo 71 £_i Jli ' 



C = NA 



he total number of sites in the lattice, and A is the largest absolute 

energies associated with pairs of global configurations that differ at 

This value, however, is too large to be of any practical use in most 

3idas (1984) has shown that if U has not more than two local minima, 

cor tputed as: 



c =;v 



tie 



minimal energy change between a local minimizer and a neighboring 
that it differs at exactly one site) configuration. He also conjectures 
expif ssion holds in general, but this result has not been confirmed. 



paper, White (1984) characterizes the initial annealing temperature 

standard deviation of the "density of states" (the number of possible 

stem, per unit energy, for each value of the energy) when this function 

Gaussian (which seems to be the case for a large class of systems). 

particular cases this value can be determined analytically from the structure 

, but in general, it has to be computed numerically from a simulation 

at high temperature. 



th; 



ma i\y 



en 



lass of systems in which we are interested, we have found, by a trial 
, that a value of C equal to 1.5 times the natural temperature of 
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the system ( 
MRF model 
iterations), b 
in this area. 



tie 



Another 
simulated an 
AUj associat 
comes from 
only the state > 
the sites of thi 
color. In a 
the sites that 
of colors nee 
graph that d 
below by the 
the minimun 
state of the 
some particu 



All the 
the case whe 
this set is inl 
solutions by 
as we increase 
generalize 
time dependent 
functions thai 
Hwang, 1984 



e., the temperature associated with the Gibbs distribution of the prior 

produces a reasonable convergence behaviour (of the order of 500 

it clearly, more research, both theoretical and experimental is needed 



important factor which determines the computational efficiency of 

ealing is related to the difficulty in computing the increment in energy 

d with a change in the state of the j th variable. If the energy function 

s probability measure of a MRF, the computation of AUj will require 

of the variables in the neighborhood of j. Suppose now that we color 

lattice in such a way that any two neighbors will always be of different 

parallel implementation we can, in principle, update the states of all 

are of the same color in a simultaneous way. The minimum number 

led to satisfy this condition is called the "Chromatic Number" of the 

scribes the neighborhood structure of the MRF, and it is bounded 

size of the largest clique of the system. This number, then, determines 

number of steps that are needed in a parallel machine to update the 

vlhole lattice. We will analyze these implementations in more detail for 

ar examples in the next chapters. 



4.2. Continue us Valued State. 



a /ailable convergence results for the annealing algorithm hold only for 
e the set of allowable values for the state of each variable is finite. If 
nite, but compact, we can still use these results to find approximate 
liscretizing it. However, the computational complexity will increase 
the resolution of this discretization. An attractive alternative is to 
; approach discussed in section 2.2 by making T in equation (4) 
A convergence proof for this modified scheme, for smooth energy 
satisfy appropriate boundary conditions, can be found in Geman and 
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5. Discussio 



ha\e 



presented a class of probabilistic models with local dependencies 

represent prior generic knowledge about the solution of a reconstruction 

tli 2 class of MRF's on finite lattices. We have seen how they can be 

>ecified by defining arbitrary "potential functions" which are supported 

es of the associated neighborhood system. It is thus easy to define 

fiblds with a wide range of different behaviors. For example, if the only 

knowledge that we have is that the reconstructed surface should be piecewise 

may use a 4-connected lattice with Ising potentials: 

f-l, if |i-y| = l and A = / y 
1, if \i-j\ = 1 and fi?£ fj 

lo, otherwise 

In this Jase, the natural temperature of the system will index a one parameter 
family of fields with varying degrees of granularity. 



We 

which can 
problem: 
completely s 
on the cliqt 
families of 
prior 
constant, we 



Smooth 
with quadratic 



5, an 

to use a MR 

construction, 

defined on a 

two MRF's 

them. 



We shov 
a MRF has 
in thermal ec 



Vcifi, //) = « 



surfaces can be modeled using the same neighborhood system, but 
potentials: 



Vc(/ 



lo, 



(fi-f 3 )\ if K-;| = i 

otherwise 

More cc mplicated, non-isotropic patterns can also be modeled, using slightly 

larger neighl orhoods (as in Cross and Jain, 1983). Also, as we will see in chapter 

approp riate choice of the lattice and the neighborhood system, permits one 

7 to model sets of piecewise smooth curves on the plane. Using this 

it is possible to model the behavior of a piecewise smooth function 

two-dimensional lattice (a "piecewise smooth surface") by coupling 

Dne for the smooth portions, and another for the curves that bound 



ed how the probability distribution of the configurations generated by 

tl e same form as the one associated with a macroscopic physical system 

uilibrium, so that one can use Monte Carlo procedures that simulate 
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the behavio of such systems to generate sample functions of arbitrary MRF's. The 
Markovian iroperty of the models imply that the computations performed by these 
procedures ire local in nature (the updating rule for each site depends only on 
the states of its neighbors), so that, in principle, efficient parallel schemes can be 
designed fol their implementation. We will examine this question in detail in the 
next chapte , where we discuss the use of MRF models and Bayes theory for the 
optimal soli don of reconstruction problems. 
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1. Introducti >n. 



The use 

requires the 

(i) A prior 

(ii) Stochas 

(iii) Approp 

(iv) Estimators 

(v) Efilcien 



In the 
probabilistic 
constraints about 
develop the 



First of 
and present 



Chapter 3 



OPTIMAL BAYESIAN ESTIMATORS 



of the Bayesian approach for the solution of reconstruction problems 
development of the following items: 
robabilistic model for the functions to be reconstructed, 
ic models for the observation processes, 
iate loss (error) criteria. 

that are optimal with respect to (i), (ii), and (iii). 

algorithms for the computation of these estimates. 



Drevious chapter, we discussed item (i), and presented a class of 

models that can be used very effectively to encode prior generic 

the solutions of reconstruction problems. In this chapter we will 

emaining necessary ingredients that are necessary to perform optimal 

reconstructidns in the general case. 



ill, let us formulate the class of problems of interest in a precise way, 
general stochastic model for the observation process. 



2. Problem Hormulation. 

We mentioned in chapter 1 that there is an important class of perceptual 
problems wrjose solution can be found by reconstructing a function f : R n *-+ 
R m on a fin te set of points that lie inside a compact domain fi C R n . Although 
the methods Jiat we will develop are, in principle, perfectly general, for the sake of 
clarity we w|l confine ourselves to the important particular case when n = 2 and 
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m = l. We 11 



each one of Lhe N sites 



re, therefore, interested in reconstructing the value of a function / at 
of a lattice L (we will denote die value of the function at 



site ieLby 



2.1. Stochas ic Model for the Observations. 



Let us 
of L, and 
by: 



Assume that we have a set of observations g on a subset S of the sites 
the process by which these observation are obtained can be modeled 



tint 



Here , 
general nor 
invertible 
for examp 
linear 
distributioi 
also that it 



gy = ¥(ffy(/),ny) , jes W 

ffy(-) is an operator with local support that represents some kind of (in 

-invertible) degrading operation (such as blurring); * is an operation 
ith respect to ny (so that n y = ^'(gy, #;(/))* k ma ^ re P resent ' 

:, noise addition or multiplication followed by a memoryless non- 
[ ion. ny represents a scalar noise process with known probability 
P nj . We will assume that ny is independent of n it for all i ^ j, and 

is independent of /. 



trans formation. 



Given 
be given 



hi 



As an exa 
have: 



/.•). 



/, the conditional probability distribution for the observations P glf will 



t"€5 

Assuming that P„,(n.) > for all i, and all possible values of*, we can define 
the functions *; by: 

*,-(/, ft) = - In P».-(* _, (w. »<(/)) (2) 

so that we| can write the conditional distribution as: 

i»,l/(ffj/) = «Pi- !>(/.«)] (3) 

nple, consider the case of additive , zero mean white Gaussian noise. We 



Hi(f) = fi 
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2.2. Posteriojl Probability Distribution. 

Since wfl are using a MRF model for the field /, its prior distribution will be 
of the form: 



with 



where C ran 
Using 



Using 
a constant 
also follow a 



fcr 



with 



Where Z P is 



*(a,6) = a + b 



Pni[x) 



2 /o_2i 



v/27T 



cxp[-x72<r] 



ircr 



Pg\/{91 /) = II ~=r «pH/.- - w)72^ a ] = 

ies V27TO- 
= exp[- £{ln(V£Fa) + -L(/ t - - g t ) 2 }] 



t'e5 



2a 2 



^/(/)=^ex P [-^C/ (/)] 






(4) 



es over the cliques of the neighborhood system of /. 
rule, we find that the posterior distribution is: 



Biyes 



Pf\ g (f>9)- JTQ 



tie 



expressions (3) and (4) for P f and P g \ ft and recognizing that P g {g) is 
a given set of observations, we get that the posterior probability will 
Gibbs distribution: 



P f \ g (f;g) = ^eM-Up(f;g)] 



a constant, and the functions <I> t are defined by (2). 
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(5) 



(6) 



We can 
considering 
in thermal e( 



t mt, 



the behavior 
field whose 
noise varianq 
fields. This 



low provide a physical interpretation of the posterior distribution, by 

, while the prior distribution (4) describes the behavior of a free field 

uilibrium (see section 3.2 of chapter 2), the distribution (5) describes 

of the same field coupled with a fixed (but spatially varying) external 

is given by g. The functions <I> t , whose magnitude depends on the 

, can then be interpreted as the coupling strengths between the two 

coupled system is also Markovian, and if 



vilue 



Since 
minimizer ol 
standard 
quadratic 



//,■(/) = //,-(/,•) for all i € S 



its neighborh 3od structure will be identical to that of the original field. 

The imp ortance of this interpretation lies in the fact, which will be proved 
in the follow ing sections, that the optimal estimate for / can be obtained simply 
by observing the equilibrium behavior of this coupled field. Before considering this 
question in c ;tail, let us define the appropriate cost functional for the applications 
we are intere ;ted in. 

3. Cost Funclionals. 

The Ba>|bsian approach to the solution of reconstruction problems has been 
adopted by s< veral researchers. In most cases, the criterion for selecting the optimal 
estimate has ?een the maximization of the posterior probability (the Maximum a 
Posteriori or VlAP estimate). It has been used, for example, by Geman and Geman 
(1984) for th \ restoration of piecewise constant images; by Grenander (1984) for 
pattern recon jtruction, and by Elliot et. al. (1983) and Hansen and Elliot (1982) for 
the segments ion of textured images (a similar criterion — the maximization of a 
suitably defir ed likelihood function — has been used by Cohen and Cooper (1984) 
for the same purposes). 

th; 



i use of this criterion defines the optimal estimator as the global 

the posterior energy U P (equation 6), it is closely related to the 

regujarization method that we discussed in chapter 1. Indeed, if we assume 

pojentials for the prior MRF model, the term U (f) corresponds to a 
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smootl mess 



global 

are corrupted 
so that Up w 
models, the 
regularizatior 
on a purely 
very similar 
however, the 
it provides a 
verifying the 
sample fields' 



assumption (the "stabilizing functional"), and if the observations 

by additive Gaussian noise, the term £ <!>,(/, &•) will also be quadratic, 

11 have a unique minimum. For more general prior and observation 

vlAP estimator may be considered as an extension of the standard 

approach. Thus, the variational principle proposed by Blake (1983), 

dlragmatic basis, for the reconstruction of piecewise constant images is 

b the one derived by Geman and Geman (1984). Even in this case, 

precise probabilistic formulation in the latter case is preferable, since 

precise interpretation of the parameters, and a practical means for 

adequacy of the prior assumptions (via the experimental analysis of 



In some 
mean square* 
of fields. For 
functions, coflrti 
Habibi (1972 
for solving th 



The min 
been used as 
case. We will 
for the follov^ng 
(i) It permit > 
(ii) It is in cllser 



We will 
segmentation 



Consider 



other cases, a performance criterion, such as the minimization of the 
error has been implicitly used for the estimation of particular classes 

xample, for continuous-valued fields with exponential autocorrelation 
ipted by additive white Gaussian noise, Nahi and Assefi (1972) and 

i have used causal linear models and optimal (Kalman) linear filters 

i reconstruction problem. 



mization of the expected value of error functionals, however, has not 
an explicit criterion for designing optimal estimators in the general 
show that this design criterion is in fact more appropriate in our case, 

reasons: 
one to adapt the estimator to each particular problem. 

agreement with one's intuitive assessment of the performance 



of an estimator, 
(iii) It leads t d attractive computational schemes. 



now propose design criteria for two particular problems: image 
and surface reconstruction. 



3.1. Error Cri erion for the Segmentation Problem. 



a field / with N elements each of which can belong to one of a finite 
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set Qi of classes. Let /,- denote the class to which the i ih element belongs. The 
segmentatior problem is to estimate / from a set of observations {g\, . . .,g p }. Note 
that /,- does lot necessarily correspond to the image intensity. It may represent, for 
example, the texture class for a region in the image (as in Elliot et. ah, 1983), etc. 



A 

elements tha 
error e 9 as: 



reasonable criterion for the performance of an estimate / is the number of 
are not classified correctly. Therefore, we define the segmentation 



where 



3.2. Error Ci iterion for the Reconstruction Problem. 



In this 
on finite sets 
of an image 
should be o 
total square< 



case 



will be a 
Let us 



To deri 
first present 
which states 
are known, 



N 



«=1 



JiJitQi 



*( 



-c 



if a = 
otherwise 



(7) 



(8) 



, we also consider a field / with N elements which can take values 
{Qi}* but now we assume specifically that /,- represents the intensity 
[or the height of a surface) at site i. This suggests that an estimate / 
nsidered "good" if it is close to / in the ordinary sense, so that the 
error: 



N 






(9) 



reasonable measure for its performance. 

derive the optimal estimators for these error criteria. 



mw 



4. Optimal B lyesian Estimators. 



ne 



the optimal estimators with respect to the criteria stated above, we 
he general result (which can be found, for example in Abend, 1968) 
|hat if the posterior marginal distributions for every element of the field 
optimal Bayesian estimator with respect to any additive, positive 



i ie 
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definite cost 
expected cos 



In more 



with 

We will 
to belong tc 
is stra 

functional C 
possible / 



if a = b 

if a 7^ b , for all i 

assume that the value of each element / t of the field / is constrained 

some finite set Q t (the generalization to the case of compact sets 

ightfoflward). The Optimal Bayesian estimator / with respect to the cost 

is defined as the global minimizer of the expected value of C over all 

9' 

cC/) = f f c(f,f)dP f M>9) = 

J J>9 



ai d 



We now hav ; 



Theorem 1: 



The optimal 
C can be foilnd 
element, i.e., 



unctional C may be found by independently minimizing the marginal 
for each element. 

precise terms, we will consider cost functionals C(f,f) of the form: 



C(f, f) = E Ci(f it fi) 



(10) 



f=0, 



inf J fg C(f,f)dPfM>9) 



(11) 



estimate of a field / with respect to the positive definite cost functional 
by minimizing independently the marginal expected cost for each 



A- = q : E Ci{r, q)Pi(r \ g) < £ C^r, s)Pi(r \ g) 

for all s ?£ q and for all % £ L. 

Pi(r | g) is th i posterior marginal distribution of the element i: 



Pi(r\g)= E P/\,{fi9) 

f'fi=r 
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(12) 



Proof: 



First, we not* 



where P g (g) i > 



The opt mal 
derived from 



and get that 



that since C is positive definite, and since 

PfM^) = P/\ g (f',g)P g (9) 
a constant for a given set of observations, we can write, from (11): 

£ C{f, f)P/\ 9 if; g) = inf £ C[f t f)P flg (f; g) 
f f f 



Using (10), \Je rewrite the right hand side as: 

infEE^(/-A)^/| ff (/;fif) = 

/ / *• 

= infEE^(/.-.^/if(/;ri = 
/ t / 

= inf££ £ Cir.JAPfrifig) 

f i rEQi f:fi=r 

From (12), wje find that this expression is equal to: 

f i r£Qi 

which, since \j is positive definite, we can rewrite as: 

Einf Y,Ci{r;h)Pi{r\g) I 



estimators for the error criteria defined in section 3, can be easily 
this result: 



In the c ise of the segmentation problem, we put 



Ci{fu fi) = 1 - *(/« - fi) 



E(i-%,/ t )A(r|g) = i-P,(A|g) 

reQi 
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and therefore 



We will call 
For the 



t us 



\Q> 



now, 



implies that 



or equivalent y, 



where 



tha 



Note 
element, or 
we may assume 



/,•=?€ Qi : PM I g) > P& I g) 

for all s ^ q (13) 

estimate the "Maximizer of the Posterior Marginals" C/mpm)- 
construction problem, we set: 



£ (r - q?Pi{r | y) < £ (r - sfPfr \ g) 
reQi reQi 



■2qf + q 2 < -2sf + s' 



(r - qf <(f- sf 



r = E rp i( r I g) 

reQi 



so that the oj timal estimate is: 



fi=qeQ< : Cfi-q) 2 <Cfi-sf 

for all s ^ q 



We will call t lis estimate the "Thresholded Posterior Mean" Qtpm)- 



(14) 



tie 



these results still hold if the sets Q t of allowable values for each 
individual cost criteria C t are not the same for all i. In particular, 
that the index i varies over the union of two lattices: 

ieLi\jL2 
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and let the fi 
and at the si 
the direction 
and Geman 
now define 



for any posi 



Id at the sites of L, represent the height of a pieccwise smooth surface, 
es of L2. take an integer value to indicate the presence (and possibly 
of a boundary between two adjacent continuous patches (see Geman 
1984; we will explain this construction in detail in chapter 5). If we 
mixed error functional: 

e,„(/,/)=E(/.-/.) 2 + xE(i -Wi-ti) 



ive value of X, the optimal estimate will be: 

i € L\ 
i€L2 






The mlin obstacle for the practical application of these results, lies in the 
formidable < omputational cost associated with the exact computation of the marginals 
and the met n of the posterior distribution given by (5), even for lattices of moderate 
size. In the next section we will present a general distributed procedure that will 
permit us tc approximate these quantities as precisely as we may want 

5. Algorithifs. 

The allorithms that we will propose are based on the use of the Metropolis or 
Gibbs Sam >ler schemes that we presented in chapter 2, to simulate the equilibrium 
behavior of the coupled MRF described by equation (5). We recall that the Markov 
chain genei ated by these algorithms is regular, and their invariant measure is the 
posterior dstribution P /|f . The law of large numbers for regular chains (see, for 
example, 1 emeny and Snell, 1960) establishes that the fraction of time that the 
chain will pend on a given state / will tend to P /|f (/; g) as the number of steps 
gets large, Jidependently of the initial state. This means that we can approximate / 

by: , „ .. 

(15) 






w 



and the po >terior marginals by: 



K — Tl t=k 



(16) 
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where f® i: the configuration generated by the Metropolis algorithm at time t, 
and k is the Lime required for the system to be in thermal equilibrium. From these 
values, f MI , A and f TI > M can be easily computed using (13) and (14). 

This pr >cedure is related to the use of simulated annealing (see section 4 of 
chapter 2) 1 3r finding the global minimum of U P (i.e., the MAP estimate: see 
Geman and Geman, 1984). In our case, however, we are interested in gathering 
statistics abc ut the equilibrium behavior of the coupled field at a fixed temperature 
T = 1, ratli :r than in finding the ground state of the system. This fact gives our 
procedure s< ime distinct advantages: 

1. It is difficult to determine in general the descent rate of the temperature 
(anneal ng schedule) that will guarantee the convergence of the annealing 
process in a reasonable time (it usually involves a trial and error procedure). 
Since v e are running the Metropolis algorithm at a fixed temperature, this 
issue b< comes irrelevant 



sen;e 



2. Sine 

the vali 

the 

refined 

thatthi 

inverse 

Them; 
problem, a 
per elemen 
posterior 



: in our case we are using a Monte Carlo procedure to approximate 
es of some integrals, we should expect a nice convergence behavior, in 
that coarse approximations can be computed very rapidly, and then 
to an arbitrary precision (in fact, it can be proved (see Feller, 1950) 
expected value of the squared error of the estimates (15) and (16) is 
y proportional to n). 

in disadvantage of this procedure is that in the case of the segmentation 
large amount of memory might be required if the number of classes 
m is large (we need to store the N(m - 1) numbers that define the 
rginals). 



m i 

With r aspect to the relative performance, we point out that in many cases, 
particularly for high signal to noise ratios, the MAP estimate is usually close to the 
optimal on( . If the noise level is high, however, the difference in the performances 
of the two < stimators may be dramatic. This is illustrated in the example portrayed 
in figure 6: panel (c) represents the MAP estimate of the binary MRF (a) from the 
noisy obser /ations (b); it is clear that the approximations to the MPM estimates 
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par z\s 



shown in 
intuitive ex 
is implicitly 
equal to zero 
to noise ratid 
error will be 
noise situatioti 
from its 

MPM estimator 
making a 
return to th 
estimates in 



p ination 



(d) and (e) are better than the MAP from almost any viewpoint. An 

for this behavior comes from the fact that the MAP estimator 

Hhinimizing the expected value of a cost functional Cmav{!, f) which is 

only if /,• = ft for alii, and is equal to, say, M otherwise. If the signal 

is sufficiently high, the expected value of the optimal segmentation 

very close to zero, so that / M pm ^d Jmap w M coincide. In a high 

, however, the MAP estimator will tend to be too conservative, since 

it is equally costly to make one or one thousand mistakes. The 

, in contrast, can make a better (although more risky) guess, since 

mistakes has only a marginal effect on the expected cost. We will 

example, and analyze in detail the relative performance of both 

next chapter. 



viev, 3oint 



fev 



i le 



6. Computati mal Complexity and Parallel Implementations. 



We hav< 
large class oi 
of the Markjv 
section, we 



(i) Which o 
the 



We will 
Thinking Mafchines 
the "Connection 
time of these 



6.1. Serial C< mplexity. 



Suppose 
(Metropolis, 



seen how the optimal solutions of reconstruction problems , for a 
cost criteria, can be obtained from the observation of the evolution 

chain generated by the algorithms presented in chapter 2. In this 

discuss the following questions: 

these algorithms is the best one to use on a serial machine, from 
viewpoint of the computational efficiency. 



viill 



(ii) Which o le is best suited for an implementation in parallel hardware. 



dso describe a parallel machine that is currently under construction at 
Corporation and at the MIT Artificial Intelligence Laboratory: 
Machine" (Hillis, 1985), and present estimates for the execution 
algorithms in that particular piece of hardware. 



we are running our algorithms on a serial machine. In the three cases 
-leat Bath and Gibbs Sampler), we first have to select the next site 
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(a) 



(b) 



(c) 





(d) 



(e) 



Figure 6. a) Sample function of a binary MRF. (b) Output of a binary symmetric channel 
(error rate: QA I (c) MAP estimate, (d) Monte Carlo approximation to the MPM estimate, (e) 
Deterministic a iproximation to the MPM estimate. 



whose state 1 
in the poster 
element by tl 



ff B g E ( v c(f {9) ) - V c (f)) + *,(/<<>, ft) - ♦,(/, 9i ) (15) 



where 



as to be updated. Assume it is site i. Let AU q denote the increment 
or energy associated with replacing the value of the state of the I th 
e value q. Using (6) and the expression for U of (4), we get: 



\q, j = t 

Let C(AU) denote the computational cost of evaluating (15). 



(16) 
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The necessary 
(see section 
(i) Select tl 



random 
a table, 

(ii) Compu 

(iii) Check i 

(iv): 

(iv) Compu 

(v) Genera 



number in the range (0, |Q t |], with cost C{pm), and load q from 
with cost C(load)). 



(vi) Compaje it with exp[-AU q ], 

Therefdre, we have that the total updating cost for the Metropolis scheme, C M , 



satifies: 



For the 
deleted. The 



a a 



(v) 

exp[- 

(vi)Ifr 
Theu 



Pi 



steps for updating the state of site % are, in the Metropolis scheme 

.1 of chapter 2): 

e candidate state q from the set Q t (generate a uniform pseudo- 



e AU q . 



AU q > (cost: C{comp)). If not, set /,- = q. Otherwise, go to 

e exp[-AU q ] (cost: C{exp)). 

e a new uniform pseudo-random number in the range (0, 1). 



C M > C{AU) + C(prn) + C(comp) + C(load) 

C M < C{AU) + 2C(pm) + C(exp) + 2C(comp) + C{load) (17) 

Heat Bath scheme, steps (i), (ii) and (iv) are identical, and step (iii) is 
remaining steps are in this case: 

a new uniform pseudo-random number r in the range (0, 1 + 

til 

> 1, set U — q\ otherwise, leave /, unchanged, 
ating cost for the Heat Bath scheme, C H b is then: 



Gei erate 



Chb = C{AU) + 2C(prn) + C{exp) + 



C(comp) + C(add) + C{load) 
and in teneral, it will be higher than Cm. since 



(18) 



C(exp) > > C(comp) 

For thej Gibbs Sampler, we select the new state by generating a pseudo-random 
number which takes values on Q t -, with probabilities given by the conditional 
distribution (equation (1) of chapter 2). To do this efficiently, we rewrite this 
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equation as: 



(Note that A 7 ft = 0). 

Let Qi = = {qi, . . .,9m}- We now generate an array a, by putting: 

ao = 



The new statl 
r in the rang 



The comput itional cost will be: 



Cgs 



note that we 
array a. 

If AT is 
estimate, the 



td 



where C(seZ 
state is goinj 
pseudo-rand >m 
requires onl> 
using a 



PM I /) 



_ cxp[-AC/g] 
Eretf, exp[-AC/ r ] 



ay = ay_i + exp[-At7,J , ; = 1, . . ., M 

fi is now computed by generating a uniform pseudo-random number 
(0, a M ], and putting: 



(M - 1)[C(AU) + C(exp) + 2C(adrf) 4- 4C(^ad) + C(comp)] + 

+C(prn) 



(19) 



are including the overhead cost incurred by the use of the auxiliary 

he size of the lattice, and we perform n iterations to compute our 
total cost will be: 



C T = N • n • (C(update) + C(select) + C(overhead)) 



(20) 



£) is the cost associated with the selection of the next site whose 

to be updated. This selection involves the generation of 2 uniform 

numbers in the first two cases, whereas for the Gibbs sampler it 

a couple of additions, since in this case we can select the next site 

deterlninistic rule, such as lexicographic order (see section 6.3 below). 
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Since 
Metropolis a 
as the size o 
needed to ge 
faster in the 
latter case w< 
sampling of 



C^ipdate) is the dominant cost, apparently one should conclude that the 

gorithm is the most efficient. It must be considered, however, that 

the state space (i.e., M = |Q t |) increases, the number of iterations 

an estimate with an equivalent degree of precision will increase much 

Metropolis or Heat Bath cases, than in the Gibbs sampler, since in the 

are using an "importance sampling" procedure, versus the uniform 

Hie former (see Hammersley and Handscomb, 1965). 



A rigoro 
on the natun 
needed to cl 
parallel im 
We will justi 



6.2. Parallel 



is analysis of the tradeoffs involved is not easy, and is highly dependent 

of the particular problem, so that an experimental analysis might be 

rify these questions in each case. In the more interesting case of a 

pigmentation, however, the Gibbs sampler becomes the obvious choice. 

y this assertion in the following sections. 



Jpdating. 



A neces: ary condition for the convergence of the probability measures of the 
Markov chai ts defined by the Metropolis, Heat Bath or Gibbs Sampler algorithms 
to the Gibbs measure is that if two sites belong to the same clique, they are never 
updated at tl le same time. As we will show in the next section, this condition is 
also sufficien only for the case of the Gibbs sampler. In this case it is possible to 
update simul aneously the states of all non-neighboring sites, by implementing the 



algorithm in 



a parallel architecture in which a processor is assigned to each site. 



The total execution time will then be reduced by a factor of 

N 
K 

where K isj the so called "chromatic number" of the graph that describes the 
neighborhood structure, and it is equal to the minimum number of colors needed 
to color the s ites of the lattice in such a way that no two neighbors are the same). 
Note that if he state of every site is allowed to take real (continuous) values, we 
may use a ni|merical simulation of the stochastic differential equation: 

df = -gradt/(/)cft + V2Tdw 
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to generate ;ample configurations from the desired distribution (see section 3.3 of 
chapter 2). 1 n this case, all sites can be updated at the same time, so that a parallel 
implementa|ion can reduce the complexity by a factor of N. 



6.2.1. Conve 



Geman and Geman (1984) established that the measure of the Markov chain 
defined by t ie Gibbs sampler will converge to the Gibbs measure independently of 
the initial sfc te, independently of the order in which the sites are updated (provided 
only that we keep visiting every site, i.e., that we update its state infinitely often). 
The converg mce of the parallel implementation, therefore, follows from this general 
result for wnlich we present here a simple alternative proof: 



W! 



First, 
i, every valu 



measure: 



gence of the Gibbs Sampler. 



note that from the definition of a MRF, it follows that for every site 
q e Qu and every configuration /, the conditional probability, 

Pr(/.- = <7 1 /y , ;V0>0 



Since b| hypothesis every site is visited infinitely often, this implies that any 
two states o( the chain will be mutually accessible (with positive probability) in a 
finite numbe • of steps, which means that the Gibbs sampler defines a regular chain. 

On the jbther hand, the Gibbs measure 7r(/) is an invariant probability vector 
of the chain To see this, suppose that at time t, just before updating site i, the 
possible conjurations of the field F(t) are distributed according with the Gibbs 



PT(F(t) = /) = *(/) 



After the up< ate we have: 



Pr(fF(* + 1) = /) = Pr(Fi(t + 1) = /,- | F j (t) = / y , j ^ i) 
.Pr(F y (t) = / y , jV0 = 
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pit ted 



because, by 
selected ran 
now com 
(see KemenU' 
(i) Has a 

(ii) The 
measu 



me isure 



r : 



Note 
theGibbs 
on the satis 
chapter 2), 
order. We 
parallel u 
parallel imp 



that 



Let/w, 
that / = {/, 



= *(/,• I /y , jV0-*(/j , jV *') = *(/) 

the definition of the algorithm, the new state of the i th element is 
iomly according with the conditional Gibbs distribution. The proof is 
by remembering a well known theorem for finite Markov chains 
and Snell, 1960) that establishes that every regular Markov chain: 
ityiique invariant probability measure. 

of the chain will converge (with probability 1) to this invariant 
independently of the initial probability distribution of the states. 

;, unlike the Metropolis and Heat Bath algorithms, the convergence of 

sampler does not depend on the reversibility of the chain (or equivalently, 

action of the "detailed balance" condition given by equation (3) of 

« Jthough this condition will hold if we use it with a random updating 

now see that the reversibility will not hold in general if we use a 

pdiiting scheme, which will make the first two algorithms unsuitable for 

ementations. 



6.2.2. Breakdown of Reversibility for Parallel Updating. 

To sho\| why this condition is violated (by the three algorithms) when a parallel 
updating scr erne is used, we will consider a first order, binary MRF on a lattice L 
with Ising p|tentials, that is, 

/,- e {0, 1} for alH e L 

f-l, if |t -i| = land /,• = /,- 
1, if |i-y| = iand/ t ^/ y 

0, otherwise 

To impl jment a parallel updating scheme, we divide the sites of the lattice into 
two non-ov( rlapping sets, which we will call B and W (the sets of "black" and 
'white" sites! respectively) as illutrated in figure 7. 



Vc(A, /,-) = < 



"d denote the state of the elements belonging to W and 2?, respectively ,so 
,//}}. The parallel updating scheme consists in updating first, say, all 



v 
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Figure 



• o • o • 
o • o • o 

• o # o • 
o • o • o 

• O • O • 

. Non-overlapping sets for parallel updating (sec text) 



Let P M 
of all the 
transition 
they are 



and similar! 



Now, 
"white-blacl 



the white si es, and then ail the black ones. Note that the random variables associated 
with any tu o sites of the same color are conditionally independent (given the state 
of the elem :nts of the other color), which means that the order in which their state 
is updated i s immaterial, so that, in fact, they can be updated simultaneously. 



,Po, denote the transition probabilities corresponding to an update 
white and black sites, respectively. Note that both Markov chains with 
p obabilities Pw and P B satisfy the detailed balance condition (although 
not regular), so that for a fixed f Bt we have: 



cle irly 



4w({fw, fa}, Ow, fa}) = n _Y™'{ B l Pw({fw, /*}, {fw, fa}) 



'(Avi/b) 



, for a fixed fw* 



Ww, fn}, {fw, f B }) = ffi'fo ftKAir, /*}, {Mb}) 



t(/w,/b) 
where n is tile Gibbs measure of the complete configuration / = {/^,/b}. 



let 



Pwnifyf) be the transition probability associated with a complete 
" update (where the white elements are updated first). We have: 



A* *(/, /) = Pw{{fw, fa}, {fw, fB})P B ({fw, fah Ow, fa}) = 
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where Pbw 
sites visited 



Now, 



and let 



Clearly, 



and so, 



n(fw>fB)' 

is the transition probability of the converse "black-white" update (black 
first). 



qpnsider the particular configuration: 

iew 
ieB 



— Pw{{fwi //*}> {fw, In}) 
•Pn({fw,fn},{fw,fn}) 



A 

*(fw> fa) 
n{fw, fa) 

A A 

"WW* Sb) 



"< 



fi = 1 for all i e L 



PB\v{hf)>PwnCfJ) 



<f)PwB{f, f) > *(f)PwB(f, f) 



so that the < etailed balance condition does not hold. 

The ab Dve argument can be easily generalized to show that if we use any 
prescribed i pdating order (such as lexicographic order), the Markov chain generated 
by any of th s three algorithms will also become irreversible. These chains, however, 
will remain regular, which means that in each case, the probability distribution of 
the configui itions generated by the chain will converge towards a unique invariant 
distribution In general, however, it will not be possible to guarantee the coincidence 
of this invai iant measure with the desired Gibbs distribution, except in the case of 
the Gibbs si mpler. 

An example of a situation in which the invariant distribution is not the 
Gibbsian m jasure, can be obtained by running the Metropolis algorithm, either 
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with lexicoghiph 
section 2 of 
of the infin 
are almost 
predictions 
order is use 
Bath schem< ) 



ic or "black-white" updating order for the Ising model discussed in 
chapter 2. If the natural temperature is below the critical temperature 
te lattice, the algorithm will produce equilibrium configurations that 
ompletely uniform, and therefore, inconsistent with the theoretical 
and with the behavior of the same algorithm when random updating 
i). The Gibbs Sampler (which in this case is equivalent to the Heat 
, on the other hand, produces consistent results, as expected. 



6.3. Discussi on. 



The 
time) for thej solution 
our general 



previous results mean that the expected computational cost (execution 
of a reconstruction problem on a large parallel machine, using 
vfonte Carlo procedure, will be given by: 



where n is 
graph of the 
Sampler, giv 



An e 
Machine" 
processing 
It is a " 
256,000 proc 
4K bits of 
square, 
computation 
a "Cross Om 



Sin^ e 



Besic es 



At each 
one microsejond 
bit is transmitted 
be implemented 



C P = nKC GS 



(21) 



tie 



number of (global) iterations; K is the chromatic number of the 
underlying Markov model, and C GS is the updating cost of the Gibbs 
n by equation (19). 



xa npk 



(1 illis, 



e of such a massively parallel architecture is the "Connection 

[illis, 1985). This machine was originally designed for the parallel 

o^|structured symbolic expressions, such as frames and semantic networks. 

Instruction Multiple Data" (SIMD) array processor consisting of 

:ssing units (each with a single bit Arithmetic/Logical unit, and about 

forage) organized in a four-connected lattice that is 512 elements 

this nearest-neighbor connectivity, it will also be possible (although 

lly more expensive), to connect any two processors in the array using 

:ga" router network (Knight, in Winston, 1984). 



cycle of the machine, for which we will assume a duration of 

, an instruction is executed by each processor, and a single 

to its neighbors. This means that the updating scheme can 

most efficiently if the field is first order Markov, but higher 
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order processes can also be implemented without using the router by successively 
propagating the transmitcd state (the execution time, therefore, will grow linearly 
with the ord ^r of the field). 



To maklb 
finding the 
(i.e., the 
will analyze 
estimator is 
the use of 
integers. We 
cycles of a si 
or comparison 
generating a 
16 cycles foi 
exponential. 



these results more concrete, consider, as an example, the problem of 

jbtimal estimate for an M-ary, first order MRF with lsing potentials 

segmentation of a piecewise constant image) from noisy observations (we 

this problem in detail in the next chapter). Let us assume that the 

be implemented in the "Connection Machine", and suppose that by 

appropriate scaling factors, all the numbers can be represented as 16-bit 

will use the following conservative assumptions: We assume that 16 

e 1-bit processor are needed to perform 16-bit addition, substraction 

; 16 2 cycles to perform multiplication or division; 2 x 16 2 cycles for 

pseudo-random number with uniform distribution on a given interval; 

memory transfer operations, and 6 x 16 2 cycles for computing an 



o 



ngl< 



time we get, 



this approacl 
more 

continuous 
smoothness 



where w is a 
section 2.2). 

This scheme 
discrete (e.g., 



Assumir|g that we run 250 iterations of the system, and ignoring the overhead 
from (19) and (21), 



Cp ?=^ 1.4 (M — 1) seconds 



(22) 



Althoug t this execution time may be reasonable in many cases, it is clear that 

becomes impractical as M becomes large. In this case, it might be 

convenient to approximate the field by one in which the state at each site takes 

vi lues in a compact set and, provided that U P satisfies the appropriate 

c mditions, use the stochastic differential equation: 



df = -gradC/p dt + V2Tdw 



(23) 



Wiener process, to simulate the behavior of the system (see chapter 2, 



will not work, however, if some of the variables are intrinsically 
?inary variables indicating the presence or absence of a boundary). In 
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this case, it 
discrete variables 
ones using 
determined, 



solv ng 



might still be possible to use a mixed scheme in which the state of the 
is updated using the Gibbs Sampler, and that of the continuous 
uation (23), but the precise form of such mixed schemes has not been 
nor their convergence properties established. 



<q 



These 

ways of 

following di 

(i) Design 
ing the 

(ii) Design 



algorith ns, 



considerations provide us with a strong motivation for finding alternative 
these problems. In particular, much more research is needed in the 
ections: 

)f more efficient (possibly deterministic) algorithms for approximat- 
Dptimal estimators for particular classes of problems. 

Df analog and hybrid networks for implementing these kinds of 



We will 
in the follow |ng 



study these possibilities in detail, in the context of specific problems 
chapters. 
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Chapter 4 



RECONSTRUCTION OF PIECEWISE CONSTANT FUNCTIONS 



1. Introduct 

In this 
developed, t 
observations 



chapter we will apply the optimal Bayesian estimators that we have 

) the problem of reconstructing piecewise constant functions from noisy 

The efficient solution of this problem is relevant for several reasons: 



(i) Binary 
useful 
and malh 



mages (or images consisting of only a few grey levels) are directly 
many interesting applications (for example, object recognition 
ipulation in restricted (industrial) environments). 



n 



(ii) Several 
(Elliot, . 
or the 
(1976)), 
constanj 



(iii) As we 
piecewiae 
be adeq 
processe 
way. W( 



estimatu n 



2. Problem F emulation. 



on. 



Derceptual problems, such as the segmentation of textured images 
t. al. (1983); Hansen and Elliot (1982); Cohen and Cooper (1984)), 
Drmation of perceptual clusters (O'Callahan (1974); Marroquin 
can be reduced to the problem of reconstructing a piecewise 
surface. 



\ ill 



see in the next chapter, where we treat the reconstruction of 

smooth surfaces, the boundaries between continuous patches can 

jately modeled by binary fields coupled with continuous valued 

. These coupled systems are very difficult to analyze in a rigorous 

hope to increase our understanding of them by studying first the 

of binary fields. 



Followir g Geman and Geman (1984), we will model the behavior of piecewise 
constant func ions using first order MRF models on a finite lattice with generalized 
Ising potentu Is: 
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We wil 
site will be: 
at a corner, 



use a free boundary model, so that the neighborhood size for a given 
4, if it is in the interior of the lattice; 3, if it lies at a boundary, but not 
md 2 for the corners. 



The Gi >bs distribution: 



defines a or 
constant pati erns 



Using tike 
section 2.1 o 
that chapter: 



with 



Of particular 
taken as the 
1975), so tha 



•1, if \i-j\ = land /,- = /,• 
Vcifi, fj) = j 1, if \i - j | = l and U ^ j j 

0, otherwise 

fi £ Qi : = {91 > • • -, qm } for all % 



(1) 



Pf(f) = Jexp[-lc/ (/)] 



(2) 



e parameter family of models (indexed by T ) describing piecewise 
with varying degrees of granularity. 



general stochastic model for the observation process presented in 
chapter 3, we get the posterior distribution given by equation (6) of 



^/|y(/^) = ^exp[-C/ P (/;g)] 



tW;0) = 7rtfo(/) + £ *(/, 9i ) 
*° ies 



(3) 



interest will be the case of binary fields (M = 2) with the observations 
tput of a binary symmetric channel (BSC) with error rate e (Gallager, 



:■ 



P(g 



„*-£ 



r (l - <0, for 9i = /,- 
for 9i jL f. 
In this case, t|ie posterior energy reduces to: 



U P (f; 9)= ¥n Z V(f it f^ + a £(1 - *(/, - 9i )) 

*J i 



(4) 
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where /,- e 



[91,92}; 



and 



it 



Note that 
modifying 
variables {/ 



tie 



which 

spatially varying 
The impi 
tools developed 
the estimation 
hand, it is 
purpose "quantum 
speeds. 



In the 
Bayesian estimators 
important 



Once the 
problem by 
in chapter 3, 
To understand 
the estimatior 
error rate e. 



*( 



10, 



if a = 
otherwise 



.- ,.(!=!) 



(5) 



(6) 



this case (and also in the case of additive white Gaussian noise), by 
constant Z Pt and applying a suitable linear transformation to the 
}, so that Qi = {-1, 1}, we can write the posterior energy in the form: 



i o .\y:|,--/|=i 



(7) 



corresponds to the Hamiltonian of an Ising ferromagnet coupled with a 

external magnetic field (whose magnitude is proportional to g). 

of this connection is twofold: on the one hand, it means that the 

for the equilibrium behavior of these systems — which is what 

process is about — may be relevant for the physicists. On the other 

co iceivable that one could use physical ferromagnets to construct special 

" computers that could solve estimation problems at atomic 



fallowing sections, we will study the relative performance of different 
;, and design efficient algorithms for approximating them in some 
pahicular cases. 



3. Relative P rformance of Bayesian Estimators for Binary Fields. 



posterior energy has been determined, one can solve the reconstruction 

f nding the optimal Bayesian estimate of the field /. As we discussed 

lowever, we have several possible choices for the optimality criterion. 

the differences in their performance, we will now analyze in detail 

of binary fields, when the observations are the output of a BSC with 
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Since tJhe 
(13) and ( 
performances 
error criteri >n 



with 



where N is the size of the lattice, and the expectation is taken over all possible 
configurations / and g. 



In particulai 



field is binary, the MPM and TPM estimators (defined by equations 
of chapter 3, respectively) coincide. The question is: how do the 
of the MAP, and say, TPM estimates compare with respect to the 

e = E[e.{f,f)] 



N 



«. = £U -*(/.• -/.•)) 



»=1 



we are interested in the ratio: 



e MAP 

Ztpm 



= E/, g exp[-E/ P (/; y)]c,(/ t f MAP (g)) 
£/,</ ex Ph^(/; 9)]c*(f, JtpmW) 
The numeri|al evaluation of this expression is feasible only for small values of AT. 

In figurjfc 8 we show a plot of the ratio r for a 2 x 2 lattice, for different values 
of the error ate e and the natural temperature T . As expected, r is never less than 
1. In the wc rst case (for e == 0.1 and T = 0.2) the error of the MAP estimate 
is 1.17 time; that of the MPM estimate; if T is not too small and e is not too 
large, both t stimates coincide, and as e approaches 0.5 (low signal to noise ratio), 
the MPM estimate is consistently better than the MAP. An experimental analysis 
of larger latt ces reveals a similar qualitative behavior, but the values of r are much 
larger in this] case (see table 1). 

3.1. Example 



We now 
it in more 
with free bodhdaries 



return to the example presented in figure 6 of chapter 3, and examine 

. Panel (a) represents a typical realization of a 64 x 64 Ising net 

, using a value of T = 1.74 (0.75 times the critical temperature 



d <tail 



67 



of the lattice) 
rate e = 0.4; 
MPM estimatd 
algorithm and 
values of the 
error (e,/64 2 ) 




F 'g" re 8 - R itio of the average errors of the MAP and MPM estimators for a 2 x 2 Ising net. 



panel (b), the output of a binary symmetric channel with error 
>anel (c) the MAP estimate, and panel (d) an approximation to the 
(which we will label "MPM (M.C.)") obtained using the Metropolis 
equation (10) to estimate the posterior density. The corresponding 
Dosterior energy U P (equation (13)) and the relative segmentation 
ire shown on table 1. 
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Energy 
Seg. Error 



4. Exact Algi rithms for the MAP Estimator. 



the 



K 



From 
noise ratio 
one can desibn 
the case of 
which 

is 0[N) (the 
needed, and 
distributed in 
lattice length. 



com pi tes 



To sim 
loss of generality 
this form by 
stationary, we 



From 
equivalent to 



Table I 



discussion of the previous section, it is clear that if the signal to 

not too low, the MAP criterion may be an appropriate choice, if 

efficient algorithms for computing it. As we will now show, in 

oj(ie-dimensional binary fields, one can in fact construct an algorithm 

(exactly) the MAP estimate with computational complexity which 

ength of the lattice) in a serial machine: at most 22N operations are 

the storage requirements are also O(N). The algorithm can also be 

a parallel architecture, making its execution time independent of the 



/ 






-5594.8 



226.0 
0.4 



6660.9 
0.33 



-6460.0 
0.128 



-6427.0 
0.124 



plfy 



the notation, we will assume that /,- e {-1, 1} for all i (there is no 
in this asumption, since any binary process can be brought into 
reversible linear transformation). Also, assuming the noise process is 
introduce the notation: 

where T is tf e natural temperature of the field. 



equations (1) and (3), it is clear that the MAP estimation problem is 
he minimization of: 
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tic 



where n is 
configuration 
be reduced 
for the odd 
blocks). We 



number of places where /,- =^ /,-+, (the number of odd bonds of the 
). From this expression, it follows that the MAP estimation process can 

the problem of finding the optimal value for n, and the best locations 
xmds ( which we will also call "boundaries'* between constant-valued 
will now present a procedure for performing this task. 



ID 



Description 



>f the Algorithm. 



The ide i in which this method is based is the following: 



stait 



We 
estimate k e 
is initially 



se: 



scanning the sequence {&•}, say, from the left, with some initial 
{-1, 1} for the value of / in the block that starts at / (a pointer that 
t to 1). 



by putting a 
[/ ,i], that is 

where 



As we will 
to Zo) needs 



hold simultaneously 
maximum lik 



Whenever we process a new observation gy, we ask if we can get a lower energy 
boundary in j and in the best possible location / within the interval 
we ask if: 

U b + 1 < U p 



see 



t3 



u P = E *+*(*) 

U b = 1 + £ ¥ +jk fo) + J2 *_ k ( gi ) 
*=/o i=M-i 

below, the optimal boundary location / (which is initially set equal 
be updated only if the conditions: 



•ML 



fr ^ k 



rML / f ML 
Jj T* Jj'-l 

t=/o t=/ 



, in which case / is set equal to j - 1. Here, f ML denotes the 
^lihood estimate; since we are using a white noise model, it is given 
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by: 



8-t 



If we 
update the 
estimate foil 



Otherv ise, we just set fj = /c, and continue to process the next observation. 



When 
process bac 
to make thi 5 
that we cat 
parallel ove 
subsequences 
in /. The 



Formal 



/ 



ML 
3 



1, if¥ + i(gy) < *-i(#) 

-1, otherwise 



a lower energy by putting a boundary at /, we set / t - = it for % e [k, l]\ 
value of the pointer / by setting it equal to / + 1, and set the new 
the value of /, in the block that starts at l , equal to -A:. 



we reach g N> we take f N as the initial estimate and run the same 

wards to get the final solution (in fact, one can show that it is possible 

backward run as soon as we get the second boundary). This means 

implement the algorithm in a distributed fashion, by processing in 

laping subsequences of {&•}, provided that the length of each of these 

is greater than twice the length of the largest constant-valued block 

solution is then obtained by pasting together these partial estimates. 



final 



y, the algorithm is as follows: 



Definition ' Variables. 

i: Current p osition. 

/ : Pointer t ) the beginning of the current region. 

/: Current 5timal location of the boundary in the interval [l ,i]. 

k: Current « stimate for f([l 0t I]). 

U p : Energy ncrement associated with the assignment f{[lo,i]) = k. 

U m : Energy increment associated with the assignment f{[lo,i]) = -k. 

U h : Energy increment associated with the assignment /([Jo,*]) = k)f((l,i]) = 

si: Best loca (maximum likelihood) estimate for /,-. 

siml: Best 1 >cal (maximum likelihood) estimate for /..j. 
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-k. 



U pl : Energy increment associated with the assignment f{[lo,l]) = A:. 

U rnl : Energjl increment associated with the assignment f{[lo,l\) = -k. 

U temp : Tem|orary storage register. 

M: A very |arge positive number. 

K : Switch ndicating the method for estimating f i% 

Algo rithm A1(K ): 

1: Initialization. 



Set 
Set 

Set 



Begn 



l = I = 1; U p = Urn = U ml = 0; U b = 1; U pl = M. 

fc = 1, if K = and * +i(gi) < tf _ x ( gi ) ; 
-1, if iiT = and * +1^) > # _!( gi ) ; 
/Co, if K ?± 0. 

stml = A; 



2: Mj in Loop: For i from 1 to N do: 



Sit 



si = 1, if tf +1 (fl) < # .i^.) ; 
—1, otherwise. 



1: See if the optimal boundary location needs to be updated: 

[f (at ^ k and at ^ siml and U p - U pl -U m + U ml < 0) do 

Update boundary location: 

Set : 

I = t - 1 
Upi = U p 

U m l = U m 

U h = U p + l 
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L2: Update energy increments: 
Set : 

U p = U p + ¥ +k ( 9i ) 
U b = U b + * _*(*■) 
J.3: See if a new boundary has to be introduced: 
If(t/ fc + l < tf p ) do : 
Introduce a new boundary: 
For j from / to / do : Set /y = A; 
Set : 



*; = 


-k 




<0 = 


= / + l 




"temp ==: t/p 


Upi 


c/ P = 


= U m — U m i 


t/m 


== Utemp 




u pi 


= M 




u b = 


= U m + l 





I A: Set siml = si 
Enc 

3: Se j if the last boundary has to be introduced: 
If (|/ 6 < !7 p )do: 

1.1: For j from l to / set /y = it. 
1.2: Set Jo = * + 1. 
.3: Set k = —k. 
4: Fif the last region: 

For H from / to AT set /y = k. 
End. 
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The proof of the fact that this algorithm will in fact find the global minimizer of 
(7) is presei ted in appendix 4.A. 

In app< ndix 4.B we present an alternative approach to this minimization, which 
is based on iynamic programming ideas. The resulting algorithm is less efficient than 
the one we lave just presented for the case of binary fields, but it has the advantage 
of being extensible to handle more general situations. Also in this appendix, we 
compute th : probability distribution for the number of odd bonds, and discuss the 
relationship between the dynamic programming procedure, and the use of linear 
filters to pr< duce multi-scale descriptions of piecewise constant signals. 



tech 



The 
MAP estim 
here is that 
one dimensional 
number of 
The questioji 
the optimal 
efficient thaik 



5. Estimatio n of Two-Dimensional Binary Fields. 



iques developed in the last section for the exact computation of the 

te cannot be extended to the two-dimensional case; the main difficulty 

tie geometry of the boundaries between uniform regions (which in the 

case are simply points), causes a combinatorial explosion of the 

possible configurations compatible with a given total boundary length. 

, then, is whether it is possible to find algorithms that approximate 

estimates (with respect to the selected error criterion), that are more 

the general Monte Carlo procedures presented in chapter 3. 



5.1. MAP E timator. 



fcr 



(1971) 



In the 
algorithm 
of sites (in 
Wilson 
of critical p 
each of thes! 
blocks in 
entropy 



( ase of the MAP estimator, the efficiency of the Simulated Annealing 

the minimization of U P can be improved by defining large "blocks" 

a manner that is reminiscent of the "block-spin" strategy used by 

in connection with the renormalization group approach to the study 

lenomena); the optimal estimate for the average value of the field in 

blocks is found, and then progressively refined by subdividing the 

annealing stages. We will now show that, if we use a maximum 

assumption, the structure of the MAP estimation process for Ising models 
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sue cessive 



is invariant 
(i.e., the 
correspond 
natural 
by a factor 
simulated 
than for thd 



under the "blocking" transformation; this means that the ground state 

estimator) of the aggregated process (with blocks of size L) also 

to that of an Ising model with a coupled external field, in which the 

is scaled by a factor of l/L, and the noise (coupling) parameter 

of I?. As a consequence of this scaling, the final temperature for the 

apnealing of this smaller network will be approximately L times larger 

original problem. 



MAP 



tern >erature 



Let us 
of a binary 
posterior en 



consider a binary Ising net / with the observations taken as the output 
symmetric channel with error rate e. From section 2, we know that the 
^rgy will be: 



with 



and 



Notice that 






(8) 



?(/, 



(0, 

'•'"Hi, 






.-,.(1=5) 

:quation (8) can also be written in the form: 

ff = i-E vote, fi) + « £ «?(/» «) 

ij i 



(8') 



where V c ,q\ are continuous functions satisfying: 

Vc[z> y) = V(x, y) and 

<lc[x, y) = q(x, y) for x,ye {0, 1} 

We will nov derive an expression for the energy in the "block spin" case. Let us 
partition the original lattice L into square blocks of side L. The "block observations" 
g L will now >e the density of l's on each block, i.e M 



9L{i) = 72 E 9j 



L 2 
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JEBi 



where D{ is 



For a 
configuration! 
are randomly 



1. Interactiolhs 
Thei 



between adjacent blocks: 
interaction between two adjacent blocks % and j will be: 

Uj = [-1 • {P n + Poo) + 1 • (Pio + Poi)] • L 



where P kl is 
to an elemei t 



Substituting 



tern 



This 
11,10,01 and 
are 2L(L - 1] 



the i th block. The "block field" f L is defined in a similar way. 



iven f L , we compute the energy by assuming a maximum entropy 
, which occurs when the Ts that correspond to the given density f L (i) 
distributed within the block. The energy will have three terms: 



the probability of having an element with state A; on block % adjacent 
with state I on block j: 

Pn = M%VlU) 

Poi = /l(j)(1 - /l(0) 

Pio = /l(0(1 ~ hU)) 

Poo = (l- /l(0)(1 - fiU)) 

liese values we get: 



hi = L[2{f L {i) + /,(;)) - 4/ L (t)/i(y) - 1] 



2. Interactior s within each block: 



depends on the relative frequencies of the clique configurations 
00 (ph,pio,poi and poo, respectively) on each block (note that there 
different cliques). Since the l's are randomly distributed we get: 

Pn = /lW 2 

Pio = Poi = fdi)(l - /lM) 
Poo = (1 - /lW) 2 
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so that the jpternal interaction /,• is: 

/,- = 2L(L - l)(-4f L (i) 2 + 4//X0 - 1) 

3. Interactic n with the observations: 

Assumi ig that the l's in the observations and in the field are independently 
ve get: 



distributed 



Finally, 



note that \hi 
respectively. 



I oba (i) = aL 2 [f L (i)(l - g L (i)) + (1 - f L (i))g L (i)\ = 
= aL 2 lf L {i) + g L {%) - 2f L (i)g L {%)] 

the energy takes the form: 

Udh) = ~ E In + E(^'. + Iobsii)) = 

io ,,y t - To 

= M^" E(2(/lW + h(j)) - 4/l(0/lW - 1] + 
+ ^ " 1) E(-4/lM 2 + 4/ L (0 - 1) + 

t 

sums are taken over pairs of adjacent blocks, and over all the blocks, 
For L = 1, this expression reduces to (8') with 

Vcifi, fi) = 2(fi + fj) - 4/,-/y - 1 



<7c( a > 6) = a + 6 — 2a6 
For L > 1, t ie quadratic terms of Ui are: 



r [-4 E /lW/lW - s(^ - 1) E AW 2 ] 
*ij * 



and since 



-2 £ /tM/itf) + 2 £ / L (i) 2 = £(/,,(;) - /!,(,-))« > o 
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it follows th it 



and 



which impl 
constrained 
lie in a cor 
to find the g 
the energy 
constant): 



MS 



m r 



t) 



The minimUlm 
representaticjns 
refinement ( 
solution as a i 
(the MAP es 
At present, 
determining 



The relative 
should be assessed 



EAW 2 >E/lWAW 

1 ij 



-4 E /iWAO") - s(i - 1) E hi*? < 

<-(4 + 8(L-l))£/ L (0 a <0 



that U L is negative definite for L > 1, and therefore, its minima, 
o the hypercube [0, l]"'- (N L is the total number of blocks) will always 

of such hypercube, which means that we can use simulated annealing 
obal minimum of U L% constraining the search to {0, 1} Nl . In this case, 

be minimized takes the simpler equivalent form (up to an additive 



U L = 



£ V{f L {i), f L (j)) + aL 2 £ q (f L (i), g L (i)) 



energy solutions for each L can be interpreted as "coarse scale" 
of the original pattern /. Once a solution is obtained, the next 
or blocks of size L/2) can be efficiently obtained using the previous 
tarting point, and initiating the annealing process at a lower temperature 
imates presented in this chapter were obtained using this technique), 
hlbwever, we do not have a good method (other than trial and error) for 
he optimal values for these initial temperatures. 



Also in this connection, the work of Blake (1983, 1985) should be mentioned. 
This author )roposed the minimization of an energy function similar to U P as a 
pragmatic cr terion for restoring piecewise constant images. He also proposed an 
algorithm, b ised on the successive approximation of U P by a family of convex 
envelopes to find an approximation to the global minimizer. 



performance and computational efficiency of these various schemes 
experimentally. 
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5.2. MPM E ;timator. 



In the case 
algorithm wlflose 
error) is equ| 
following i 



ideas 



First, we) 
We will 
P G with the 



of the MPM estimate, it is possible to construct a fast deterministic 

experimental performance (in terms of the average segmentation 

valent to the Monte Carlo method discussed above. It is based on the 



recall that for a binary pattern, the MPM and TPM estimates coincide, 
approximate the posterior mean of (3) by that of a Gaussian distribution 
>roperty: 

PgW = ^re- Up W for all h e {0, 1}. 

Zip 



In partic iilar, we use: 



fa(h) = -L exp[-i- £ E 0* " ^f ~ « £(* - 9if\. 



where 



where 



Z G 



Ni = {jeL : ||z-;|| = l}. 

For this distribution, h is the (unique) minimizer of the convex function: 

W = kEft- Ay) 1 + «E(* - Vi? 
J ° » yeJVi .• 

which corresponds to the unique fixed point of the system: 

, (i+i) _ Ejenhj +aTogi 
\N { \ + <*T 

We coul4 now approximate our estimate by putting: 

f'i = efc) 



(9) 



9(x) 



lo, 



otherwise 



(10) 
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There is an 
be shown 
the MPM 



tt at 



additional consistency condition that / must satisfy, however. It can 
when the posterior distribution has the form given by (3) and (4), 
estimate /, which by definition satisfies: 

Pi\ g (fi>9)>Pi\ g ((i-fi);g) 



also satisfies 



which meani 
a new MPM 
proof is inclftded 
get that it mlist 



with 



Note that the 



acts as a 
(Vidyasagar, 



Lya] >u 



This algcjrith 
we extract all 



Pi\ g (fi',f)>Pi\ g ((l -fi)',f) 



(11) 



that if we replace the observations by the MPM estimate, and compute 
estimate for this modified problem, we should get the same result (the 
in appendix 4.C). Translating this condition to the case of/*, we 
satisfy: 

fi = e(/i*) (12) 



where h* sallisfies: 

* _ Ejen.^j +aTb0(/t t ) 

|N,-| + ar 

In practice, \|e get h* as the fixed point of the system: 

\Ni\ + aT 



(13) 



hW = h 



function: 



Uh(h)= £ (fc ~ ^) 2 + aT ^{hi - B(/i t )) 2 

iJENi i 



nov function for the system (13), which is therefore (locally) stable 
978). 



ithm can be visualized as operating in two steps: In the first one, 
the information that we need from the observations and encode it in 
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h (which is c Dntinuous-valued), and in the second one, we find the closest binary 
pattern that i atisfies the consistency condition (11). 

To illus rate the performance of this approximation, we show / , for the 
example disc jssed above, in panel (e) of figure 1, and its corresponding energy and 
segmentatior error in the last column of table 1 (labeled "MPM det"). 



5.2.1. Paralkl Implementation. 



The dynam 
directly in a 
a processor 
Each update 
multiplications 
experimenta 
needed for 
total execution 
magnitude diver 



ical systems defined by equations (9) and (13) can be implemented 

parallel architecture, such as the "Connection Machine", by assigning 

to each site, and updating the state of all sites at the same time. 

will require, for both systems, at most 10 (16-bit) additions and two 

, that is, a total of 672 cycles of a 1-bit processor. We have found 

ly that in most cases, less than 50 iterations of (9), and 100 of (13) are 

|onvergence, so that, using the figures of chapter 3, we estimate the 

time as approximately 0.1 seconds, an improvement of one order of 

the general Monte Carlo procedure described in that chapter. 



Here, N{ is 
voltage of ti 
i and j\ U 
the internal 



5.3. Analog jfJetworks. 

Hopfiellft and Tank (1985) (see also Hopfield, 1982 and 1984) have studied the 
behavior of "neural" analog networks of non-linear amplifiers interconnected by 
resistors, wh :>se dynamics can be described by the differential equations: 



at jeNi t 

fi = e(« t ) 



(14) 



the 



the neighborhood of node i\ u { and /,- denote the input and output 
i th amplifier; T t y is the conductance of the link between the nodes 
s a fixed current injected at node i, and r, a constant depending on 
resistance and capacitance of each amplifier. The gain function of the 
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amplifiers, 6 
interval (0, 1). 
(hence the 



te m 



where /? is ca 



These re learchers have proved that the system (14) is always stable, provided 
we have T i3 - -- = 7y, for all i,;*, and in the high gain limit (for (3 > > 1), the stable 
fixed points v ill be local minima of the "energy" function: 



£(/) = 4 £**//.-// -£/.-* 

1 «,/ i 

Note th4 we can write (14) as: 

du{ dE U{ 



ha e 



They 
points of (18] 



with 



These 

1965) to the 
the Gibbs 
can be showri 



The mea i 
node z, which 



•) is chosen as a sigmoid function that restricts the output to the 
and has a form similar to the observed response of biological neurons 
"neural"). In particular, one can put: 

1 



0(u) = 



1 + exp[— f)u] 



(15) 



led the "gain parameter". 



(16) 



(18) 



dt dfi t 

fi = e(u t ) 

also pointed out that if one uses the gain function (15), the fixed 
will satisfy: 

1 



fi 



Hi(f) = 



l + exp[-/?rtf t -(/)] 



dE 
dfi 



E 21/// + U 

JEW 



(19) 



(20) 



equations will also be satisfied by the mean field approximation (see Reif, 
jnsemble averages of a binary process / (/ t e {0, 1}) with respect to 
measure generated by the energy (16) at a temperature T — 1/(3t. This 
as follows: 



field approximation is obtained by assuming that the local energy at 
is: 

Ei[f) = -/,-[ E Tijfj + Ii] = -f i H i (f) 

JEW 
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can be appr >ximated by: 



where f 3 - 
temperatu 






delhotes the ensemble average of /y. Since {/,} are constants for a given 
we can compute /,- as: 



re 



This means 
approximation 
in general 
the initial 
the general 



l + expl-HiCf)/T] 

hat there is a fixed point of equation (18) that can be interpreted as an 

of the ensemble average of a corresponding binary MRF (note that 

fixed point will not be unique, and will depend on the selection of 

conditions; the lack of an adequate criterion for making this selection in 

i ase represents, at this point, a serious limitation of this approach). 



tl is 



so that 



In this form, 



as /?> = 4 
Since for a 



approximate 



We have performed 
good performances 



- E/,.^o,i/ t exp[-// l (7)/r] _ 
Jt £/,=o,i exp[-i/ t (/)/r] 



In the < ase of the posterior energy (4), if we require that / t - e {0, 1}, we can 
write it in tlie equivalent form (up to an additive constant): 

2|N t -| 



A i jeNi % i o 



+ a{2 9i - l)]fi 



Hi = 



dU P 4 V* / a. to ii 2 I N *I 
ofi io j eNi io 



one can construct directly the system (18), and defining the initial state 
P) = o.5 for all t, find the stable fixed point that will approximate /. 
binary system the MPM and TPM estimators are equivalent, we can 
the optimal estimate by: 



h: 



if?, < 0.5 

otherwise 



digital simulations of the system (18), and have found very 
for relatively high signal to noise ratios. For high error rates, 



83 



the behavioi of this approximation is similar to that of the MAP estimator. We will 
have to say nore about this approach at the end of the next chapter. 

6. Simultaneous Estimation of the Field and the Parameters. 

To app y the estimation procedures described in the previous sections, the 

parameters tf at characterize, both the prior model of the field (the natural temperature 

noise process, (the error rate 6, or the variance a 2 ) have to be known. 

practical cases, however, we are only given the noisy observations g and 

qual tative information about the structure of the field and the noise, so that 

tands for either log[(l - e)/e] or a) and T have to be simultaneously 



r ), and the 
In most 
general 
/, a (which 
estimated. 



The maip 
partition function 



which makes 



An alternative 
in detail the 



Equatioi is 
Jmpm deper d 



which means 
single parameter 



In principle, one could use again a Bayesian approach, and assuming prior 
independent uniform distributions for a and T (in the ranges [a , a 1 ] and [Tg,rJ], 
respectively)] find those a, f and / which jointly maximize the posterior distribution: 

Pif a Tn\g)= exp[-U P (a,T , f)\ 

difficulty here is the extraordinary computational complexity of the 



Z(ro) = £exp[-^r7o(/)] 



this approach impractical, except for very small lattices. 



approach is based on the following considerations (we will study 
base of a BSC; other noise models can be analyzed in a similar way): 



(9) and (13), which describe the deterministic approximations to 
on the parameters of the system, e and T , only through the product: 



1 = aT = To\og^-^j 



(21) 



that the behaviour of the algorithm is completely characterized by the 
7. In the case of the Monte Carlo approximation, if we fix the 
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< ondition: 



value of 7, the value of T cannot be chosen arbitrarily, since it has to satisfy the 
consistency 



a 



with 



For a g 
/ using the 
process z 
rate e using 
will be: 



We cover 
pixels wide) 



a 



log 



(¥)■ 



1 N 

= N £ *•" 
iV t=i 



(22) 



2V = 



(23) 



where z is tfle residual process defined as: 

0, otherwise 
This means that, given 7, the correct value of T can, in principle, be determined 
in an adapti ve way, so that in this case too, the behaviour of the approximation 
depends efftfctivcly only on 7. 



ar d 



ven value of 7, we can approximate the corresponding MPM estimate 
methods developed in the previous section, and compute the residual 
the conditional (on 7) Maximum Likelihood Estimate of the error 
equations (22) and (23). The corresponding conditional estimate for T 



n = 



a 



(24) 



To mealure the "likelihood" of the estimate /, we use the degree of uniformity 
(or "whitem ss") of the residual process z. This property can be quantified by the 
variance of 1 le local noise density, which we estimate as follows: 



the lattice with a set {S 3 } of m non-overlapping squares (say, 8 
For each square Sy, the relative variance of the noise density is: 

o, = 1 — — 1 (25) 



»-m 
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with 



where \S 3 -\ is 



the area of the j th square. 
The desired likelihood function is: 



which is equivalent to a x 2 criterion (Cramer, 1946) normalized to take into account 
the sample suze. 

Alternat vely, one can use directly the likelihood that the residuals come from 
a uniform diftribution. To compute it, we note that the quantities: 

ieSj 



with 



where K is a 
(26) and (27) 
or when for 
adopt 



E*. 



J 3\ iESj 






(26) 



are distr buted according to the multinomial law: 



Using the Stifling approximation we get the log-likelihood: 

m 
L(l>\, • . ., Vm) = log P(u U ... } Vm)^-Yl U i Io S "i + 



i/i\...u m \\mj 



n = Ne = vi + . . .i/ r 



t=i 



+n log (-) + - log ( — - — ) + K 



(27) 



:onstant. We have found experimentally that both likelihood measures 

have a similar behavior when n is large. When n is relatively small, 

:(ome t, i/,- = 0, however, (26) is preferable, and so, it is the one we 
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Note 
likelihood 
defined as: 



triat a more conventional likelihood function, such as the conditional 
p oposed by Besag (1972), will not work in this case; this function is 

lQ) = /,(/) + up) with 



n 

■ec\ 



cai 



One 
we can use 
and a loweil 
get 70 = • 
MRFis 
and Snell, 
i), while 



for 



2,1 



A A 

exp[--p EjeN, V(fi, fj)\ 



T 



exp[-£ Ey €A ,, V(f i} /,)] + exp[-f H jeNi V(l - f if fj)} 



where the M |odings M Ci and C 2 are the sets: 

Ci = {i : (x t - is odd and y t - is even) or (z»- is even and y» is odd)} 
C 2 =--{i : (xi is odd and y,-is odd) or (x t - is even and y,- is even)} 

with (x if yi) denoting the row and column indices of site i (notice that, given the 
value of the field at the sites of C u the random variables associated with any pair 
of sites of C 2 become independent, and viceversa). In our case, we find that as 7 
decreases, f becomes more and more uniform, while f remains almost constant. 
It is not dif icult to see that as a result, the conditional likelihood L will decrease 
monotonica|ly with 7, which renders it useless for our purpose. 

The raAge of values [70, 7m] of the parameter 7 that corresponds to the class 
of systems Jf interest can be determined as follows: 



show that for 7 > 8 we will always have f M PMi = 0* for ^ *» s° ^ fiai 

7 M = 8. The value of 70 can be obtained from an upper bound for e 

bound for T . For example, assuming that e < .45 and T > .5T C , we 

(Note that when the natural temperature T of a first order, isotropic 

bellbw 0.5 times T c (the critical temperature of the lattice; see Kindermann 

] 980), the patterns become practically uniform (i.e., /»• =constant for all 

values of T greater than 1.5T C , we get patterns that are practically 
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indistinguishible from white noise. Therefore, the assumption T > .5T C covers 
practically aJ| the interesting cases). 

The complete estimation procedure is as follows: 
Maximum L kelihood Estimation Algorithm: 

1: Sample tl e interval [70, 7m] at the points 



2: For each 
2.1: 
2.2: 



Fir d 



7€C = {7if-7n} : 
/( 7 ) using (9) and (13). 
Cojnpute z using (23). 

using (22). If e = 0, set L(f(i)) = -00 and proceed with the 



2.3: Cqpipute 
next value 



2.4: 

2.5: 

3: Computd 



Remarks: 



1. This estimation 
the noisy 
prior assumptions 
isotropic N 



70 < 7l» -"In < IM 



of 7. Otherwise, compute a and go to 2.4. 
Compute r using (24). 

Ccfmpute L(f{i)) usin g ( 25 ) and ( 26 >- 
the optimal estimate / using: 



/ =/(7*) : K/(7*)) = sup L(/(7)) 



(28) 



The corresf onding e* ,T will be the optimal estimates for e and T Q , respectively. 



algorithm allows us to reconstruct a binary pattern / from 

Observations g without having to adjust any free parameters. The only 

correspond to the qualitative structure of the field / (first order, 

RF) and to the nature of the noise process, but neither the natural 





(c) 




9|fed l*fel \Hm\ IV* 
'15! hi*! • I* \ \ r* 



..M>4k...a Jli 



(d) 



Figure 9. (a) 5 ynthctic image, (b) Noisy observations, (c) Maximum Likelihood Estimate, (d) A 
complete scrie; of estimates. The optimal estimate (for 7 = 2.9) is indicated by an arrow. 



temperature 
means that 
even if it haj 
algorithm to 
we show suan 
image (a) 
shown in 
shown in 



1 This pro:edun 
corruption p 
to the case 
is also strai 
of (9) and 



w th 

(c:. 



"hi PI m I* 



To nor the error rate c have to be known in advance. In practice, this 

j/e can apply it to restore any binary image with uniform granularity, 

not been generated by a Markov random process. We have used this 

reconstruct a variety of binary images with excellent results. In figure 9 

a restoration. The observations (b) were generated from the synthetic 

an actual error rate of .35 (assumed unknown). The MLE for / is 

A complete series of estimates f(l)* witn 1 varying from .5 to 3.5 is 

(d). 



paiel 



e can be easily extended to handle any one-parameter noise 

ocess (such as zero mean, additive white Gaussian noise). The extension 

)f N-ary fields, i.e., to the restoration of piecewise constant images, 

g itforward (using the general algorithm described in chapter 3 instead 

in step 2.1). As an example, we present in figure 10 the optimal 



(13) 
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Figure 1( 




s. ..)•£?/&•••••>• 



(a) 



(b) 



(c) 



(a) Original ternary MRF. (b) Noisy observations (additive Gaussian noise), (c) 



Optimal (Max nurn likelihood) estimate. 



restoration c f a ternary pattern corrupted by additive white Gaussian noise. 

3. We have found that the likelihood function (26) is reasonably well behaved as 
a function , if T This permits us to perform the one-dimensional search for its 
supremum i i an economical way, by first determining its approximate location using 
a coarse sar ipling pattern, and then refining its position by a fine sampling of a 
reduced int srval. In practice, it is possible to get very good results using on the 
order of 15 samples. 

4. The who e procedure is highly distributed, so that it is possible to implement it 
in parallel I ardware in a very efficient way. 

7. Formatio n of Perceptual Clusteis. 

At the ieart of a general purpose perceptual system, one must have a mechanism 
for decidin ; which parts of an image should be considered to "belong" together 
(Marroquir . 1976). A simple instance of this problem is the grouping of dots in an 
image into .erceptual clusters. Some heuristic schemes have been proposed to model 
this pheno. tenon (see for example, O'Callahan, 1974). We will show, however, how 
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this problem 
as a particular 



;an be formulated in an elegant way that is also biologically motivated, 
case of the reconstruction of binary patterns from noisy observations. 



where U Q (f) 



6 and a are 



The conceptual model for this formulation is as follows: 

Let us o msider the dots that form the original pattern as patches belonging to 
some objects of uniform color that are partially hidden, say, by some foliage. In 
this way, the formation of clusters is equivalent to the problem of reconstructing 
these objects (whose cohesive nature is modeled by a first order MRF with Ising 
potentials) fr )m observations that are formed according with the following model: 

Suppose that /,- = 1 only if an object overlaps the i th site of the lattice. We 
assume that he "foliage" will hide this point (i.e., make gi = 0) with probability 
e, and that s >urious values of gi = 1 can appear in sites where /,- = with a very 
small probability p: 

( 1, with prob. (1 - e), if /,■ = 1 
0, with prob. e, if /,- = 1 

0, with prob. (1 - p), if /,- = 

1, with prob. /?, if /» = 



9i = 



with p < < .. The posterior energy is: 



U P (f; g) = ±-U (f) + a £ (1 - *(1 - </,-)) + 

20 t: /,=l 

+M EC 1 " *(W» 
t:/.=0 

is given by (1) and (2): 

-1, if |* — i| = l and /,■ = /y 

V c (fi,fj)=h if K-;| = land/ t ^/ y 

.0, otherwise 



(29) 



».j 



defined in (5) and (6): 



if o = 
otherwise 
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and M is a v *ry large number. 



o 



task is now equivalent to the problem of estimating / and the 

and T from the noisy observations g. To accomplish this, we can use 

[iescribed in the previous section, except that in this case, only those 

whilh fi = 1 will be useful for the estimation of the residual density and 

nee. This means that equation (22) has to be modified to: 



The clustering 
parameters 
the method 
sites for 
its local vari 



where 



and z{ is 
the residual 
that cover 



these 



With 
for clustering 
original dot 
of 7 = aTb 
that these 
and psychophysical 
to model 



8. Discussu n 



In this! 
constant 
to this problem 
of a genertlized 



a 



<<~) 



8 ~wE* 



A={z : A = l} 



defined in (23). Also, the sets S, used to compute the relative variance of 
density in (25) should now be taken as the intersection of the squares 
lattice with the set A. 



tl.e 



modifications, the Maximum Likelihood algorithm can be used 

Its performance is illustrated in figure (11) where we show: the 

pattern (upper left) and the reconducted objects for decreasing values 

The maximizer of the likelihood is marked with an arrow. We believe 

I reliminary results are encouraging, although, clearly, more numerical 

experiments are needed to assess the plausibility of this scheme 

hillman perceptual processes. 



chapter we have addressed the problem of reconstructing piecewise 

functions from noisy observations. We showed that the optimal solution 

can be obtained from the observation of the equilibrium behavior 

Ising net coupled with a spatially varying (but fixed in time) 
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Figure 11 

and the rccon 
(i.e., the optim 



Formation of perceptual clusters. We show: the original dot pattern (upper left) 
t uctcd objects for decreasing values of 7 = aT . The maximum likelihood estimate 
~H clustering) is marked with an arrow. 



external field 
a criterion, 
MPM estima 



tie 



We compared 
found that fc r 
as the noise h :vel 
A consequen :e 
estimator ma r 
advantageous 
binary signals 



In the 
Monte Carlo 
efficient, from 
(Simulated 



v 



* 
• 

m : 


1 ■< 




. If we use the minimization of the expected segmentation error as 
optimal estimate is the maximizer of the posterior marginals ( the 
:or which was described in chapter 3). 



the relative performance of the MAP and MPM estimators, and 

moderate signal to noise ratios, they are practically equivalent, but 

increases, the MPM estimate is (sometimes dramatically) superior. 

of this analysis is that, if the noise level is not too high, the MAP 

be a reasonable choice in those cases where it is computationally 

This is the case, for example, of the reconstruction of one-dimensional 

where we derived a very efficient algorithm for its exact computation. 



tv o-dimensional case, however, the situation is different: the general 

j >rocedure for the approximation of the MPM estimator is in fact more 

a computational viewpoint, than the corresponding one for the MAP 

Annealing), and in the case of binary fields, we derived a much faster 



93 



deterministii scheme with excellent experimental performance. 



We also! 
interesting 
this case, a 
likelihood fii 
estimation o 



cise 



n aximi 



We 

on the case 
we have 
figure 10). 



poi it 



out that although, for the sake of simplicity, we have concentrated 

binary fields sent through binary symmetric channels, the results that 

pre|ented can be generalized to N-ary fields and other noise models (see 



(f 



The 
segmentation 
we presented 
perceptual 
is the reconduction 
in detail in 



showed how these estimation procedures can be extended to the more 

where the parameters of the system are not known in advance. In 

urn likelihood estimation algorithm can be derived, which, using a 

nction that is computed from the residuals, allows for the simultaneous 

the field and the parameters. 



constructions that we have presented can be applied not only to image 

and restoration, but to other problems as well. As an illustration, 

a novel application to the modeling of the process of formation of 

cl Listers. Another important problem that can be formulated in this way 

of surfaces from stereoscopic pairs of images; we will discuss it 

chapter 6. 
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appendix we present a proof of the fact that the algorithm presented in 
4, effectively computes the MAP estimate for a one-dimensional* 



In this 
section 4 of chapter 
binary MRF 

The opt||mality of the algorithm follows from the following propositions: 



Proposition 

suppose that 
detected by JU 



Proof: 

Suppose Ik 
assume that 



Case 1: Suppflose 
Then, we mufct 



and therefore , 



which is a contradiction 
Case 2: Supplbse 



This means 
particular, 



Appendix 4. A 



OPTIMALITY OF ALGORITHM Al 



uflas detected by Al, and let L be the next boundary detected. We will 
7^ Jjfc+i and arrive at a contradiction. We will consider three cases: 



: Let 5* = {h,...l n } be the optimal boundary configuration, and 
jt, for k < n was detected by Al. Then, / fc+1 will be the next boundary 



Al detects L at ; < k +1 . 
have that 

U p (j) > U P (L) + U m U) ~ Um(L) + 2 



UWL:.,lk,L,j,l k+u ...}) <U(S m ) 



that 



Al detects L at j e (k+i, k+2]- 

at ; we had that L was the optimal location for the boundary. In 



U p (l k+l ) + U m U) ~ U m {lk + l) > U p (L) + UmU) ~ U m {L) 
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which impli ;s that 



and therefoj e 



which is a o )ntradiction 



Case 3: Suppose 
Then, we mlist 



that Al has not detected any new boundary at j 
have: 

U p (l k+2 + 1) < U b (l k+2 + 1) + 1 

which meant that 

U({h,...l k ,l k+3 ,...}<U(S*) 

which is again a contradiction, i 



Proposition 

the boundailes 



Proof: 

Let/*,/ A1 b 

Let 



> 



Case 1: Lo 
l, ij e s* foil 



Case 1-a: /*( 



P (L) + U m {l M ) - U m (L) < U p (l k+l ) + U m {l k+2 ) - U m (l k+l ) 



U({h,...,l k ,L,l k+2 ,...}<U(S*) 



= k+2 + 1. 



I: If Al runs from left to right starting at a point Z . and generates 
{fi, k,..-}. then, lj e S* (the set of boundaries of the optimal 
configuratioif) for j > 2. 



s the optimal configuration, and the one generated by Al, respectively. 

Lo = sup{; G 5* : j < h} 

L = mf{jeS* : ;>/i} 

If Lo = Z , ve apply proposition 1 and finish the proof; so, let us assume that 
Lo?£lo, and that h was detected at i. We have two cases: 



l . We claim that in this case, heS*, and therefore, by proposition 
j > 1. To prove this claim, we consider two subcases: 



[k,Lo))^fAi{lk,Lo)). 
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In this case, 



and therefor 



2 + U m (i) - U m {h) + U p {h) - U p {Lo) < U p {i) - U p (Lo) 

which implies that h€S*. 

Case 1-b: /|(/ , lo)) = />ii((ta,Io)). 

Suppose h 4 S*. We have that, at location », 

p(Ji) + U m (i) - U m (h) + 2 < C/ p (Lo) + J7 m (t) - U m {Lo) + 2 



£ 



since otherw 
this implies 



hat 



which meang 
Case 2: Lq 
Again, we 
Case 2-a: /* 
LetC/+,£/_ 



Note that 



we have: 



2 + U m {i) - U m [h) + U p {h) < U p (i) 



se, Lq would have been a better location for the boundary. However, 



U p (h) + U m (L) - U m (h) < U p (Lo) + U m (L) - U m (h) 

that we can improve S* by moving Lq to h, which is a contradiction. 

Jo. 

cdlnsider two subcases: 

(A>,W) = /iii((4>,y). 
the energy increments with respect to Lq: 



te 



tf+(0 = E *+*fe) 



y=Lo 



tf p (i) = u+{%) - U+{1 ) and 
C/ m (t) = U-(i) - C/_(/ ) 
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Since h wai 



and therefo e 



2 + U-{i) - U.(h) + U+(h) < U+{i) 
which mear 5 that h G5*. 
Case 2-b: /fl((Wo)) 7^ /ai((^o, *o)). 

definitions for U+,U- t we have that, by the optimality of 5*, for 



Using the same 
some j > Z| 

and therefoi e 



It shou 
Al runs backwards 
complete oppmal 



Algorithm A l: 



1: Run Al 
2: Run Al 



detected at i, we have: 



2 + tf m (t) ~ MM + U p (h) < U p (i) 



U.(j) - U.(L) + t/+(L) + 2 < U+{j) 



U-U) - U-{L) + U + (L) - U + (h) + 2 < tf + (L) - C+('i) 

which meadb that if Al detects l u it must detect L too, unless it detected l 2 first, 
but in this qpse we have that, for some p < j, 

U-{ P ) - U-{1 2 ) + U + {1 2 ) - U+{li) + 2 < U+{p) - U+{h) 
which implies that l 2 €S*. This completes the proof. 1 



d be clear that these results can be easily extended to the case where 
(from right to left). With this extension, we get the following 
procedure: 



f om left to right. Detect {l lt . . ., l n }. 
bjackwards (starting from Z 2 ). Get either 

{*2,...,U or {Zi',/ 2 ,...,U 
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In either ca* e, this is the optimal solution. 



The on 
optimal 
the following 



y thing that remains to be proved is that the determination of the 
loc&ion for a boundary is in fact performed by step 2.1 of Al. We have 



Proposition 

the optimal 
si = —A; anfl 
likelihood 
optimal location 



Proof: 



First, we 
location of 



3: Suppose that Al detected a boundary at (or started from) / . Then, 

ocation / of the next boundary has to be updated only at places where 

siml = k (note that in si we have stored the value of the maximum 

eftimate f^ L , while siml = f}L[). Suppose i is one such place. The 

will be: 



={;;'■ 



if U p (i - 1) - U m {i - 1)< u pi - u ml 
(the current value) otherwise 



nc te 



tie 



or equivalen ly 



vas 



Suppose / 
consider several 



that a necessary and sufficient condition for / to be the optimal 
boundary at the point i is that, for j £ [/ , % — 1]: 

u p (i) + MO - MO < u p (j) + MO - Mi) 



Case 1: sim = —k 



In this case, 
By construction 



U P (l) - MO < UM) - Um(j) 

the optimal location at i — 1, and we process observation i. We 
cases: 



we show that / remains the optimal location: 
, we have that: 

J7 p (»-l) = l/p(.'-2) + * +4 ( W -. 1 ) 

U m (i - 1) = U m {i - 2) + ¥-*(a-i) 
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Since siml = = —k we have that, 



and therefor \ y 



U p (i - 1 



*+fc(g.-i)-*- fc (w-i) >0 



so that / 
Case 2: simi 



> U p {i - 2) - U m (i - 2) > U P {1) - U m {l) 
remains the optimal location. 



In this case \ 'e have that 



liat 



This means 
will be obtained 
by theorem 
be placed, it 
suppose sim 

If 



then, 

because I wag 1 
token, it is 



- U m (i - 1) = U p {i - 2) - U m (i - 2) + * +jfc (g x _ 1 ) - *_*(<7,-_i) > 



= k 



U p {i - 1) - U m {i - 1) < U p (i - 2) - C/ m (z - 2) 



char 



the minimal value for U p {i) - U m {i) on a block for which si = k 

at the extremal point where si = -A; and siml = Jfc, and since, 

L of appendix 4.B, this is the only point where a boundary might 

is sufficient to update the optimal location only at these points. So, 

= k and si = — k. 

Upi - U ml < U p (i - 1) - U m {i - 1), 



U p i - U ml < U p {j) - U m {j) for ; e [l , i - 1] 

the optimal location outside the last block where si = k. By the same 
that if 



U p i - U ml > U p (i - 1) - U m {i - 1), 



the new optii lal location will be i — l.| 
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Appendix 4.B 

EjYNAMIC PROGRAMMING FORMULATION OF THE 
( )NE-DIMENSIONAL MAP ESTIMATION PROBLEM 



In this appendix we present an algorithm for finding the global minimum of: 

(1) 



which, based 
of one 



on dynamic programming principles, reduces the problem to a sequence 
dimensional optimizations. 



As we 
which can bd 
coarse 



vs ill 



the optimal ( 

This aprilroach 



A co 
L n defined 



br 



We will call 

these 

an equivalent 



For a fixed n 
boundaries, 






see, this algorithm generates, as a byproduct, a family of solutions 
considered as descriptions of the field / at different scales, so that the 
descriptions, which are computed very fast, are progressively refined until 
i nest scale) configuration is found. 



is based on the following idea: 
nfi^iration / is completely characterized by the value of f u and the set 



U = {L : / L ^U ; |£„| = n. 



(2) 



he n elements of L n the "boundaries" of the configuration /. Since 
boundaries correspond to odd bonds between neighboring cells, we can define 
energy function as: 



with U(f) = -£ ♦,,(»), f { elko.kt} 



(3) 
(4) 



, U depends only on the value of /i, and on the position of the n 
tftat is, onn + l variables. To make this dependence more explicit, let 
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us define the 



Let U and I \ denote the energy functions corresponding to the configurations with 
/i = k\ and|fco, respectively, for a given set of boundaries 



We have tha|, for n even, 
U (n, 



functions 



y=i 



(5) 



L n = {Li, . . .//„}, L\ < . . . < L n 



(6) 



N 



-n) = n + f[E^ofc)+ E **,(*■) + .-.+ E **.fe)l 



J=l 



Li + 1 



L«+l 



N 



= n + £[<?(£!) _ G(L2) + . . . - G(L n ) + £ * fa (g y )] 



J=l 



a 



Li 



N 



^(n,.Ln) = n+ ? [X:**ife')+ E **.(*/) + ».+ E **.(#)] 



and for n odi I, 



U (n, L 



y=i 



Li + l 



Ln + l 



H*. U) = n + -[-G(L!) + . . . - G(L n ) + £ **(*,-)] 



a 
2 



JV 



n + -[-C(Lj) + . . . + G(L n ) - G(AT) + £ **.(*,-)] 

y=i 



(7) 



2 



iV 



) = n + -[C(L!) - G(Ia) + . . . + G(L n ) - G(iV) + £ **ofe)l 

y=i 



2 



AT 



(8) 



(Note that E 

Let sj?>, 
Then, the optfmal energy for a given n is: 



^jb (gy) does not depend on /). 
|Sv be the sets of boundaries that minimize U and Uu respectively. 



Ul = min[U Q {n,sW) i U,{n,sW)) 



(9) 
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W( 



We will defifie S n to be the corresponding optimal set of boundaries. 

det|rmination of Sir is an n-dimensional optimization problem. However, 

show below, it is possible to decompose it into a sequence of one 

optimizations using a dynamic programming formulation. With this 

aJso get, as a bonus, the solutions sf\...,S^l lt k £ {0,1}, and 

corresponding optimal energies. If we set n = N, the solution to the original 

U*(n*,S n *) can then be found by a one dimensional search. This 

ho\|ever, can be dramatically improved by the use of the following facts: 

•educe substantially the search space for the location of the optimal 
bou n dalies Lj 6 5 n « 

sequences {U* lt U\, . . .} and {U* 2 , U\, . . .} are unimodal. This, together 
fact that the dynamic programming algorithm uses Sy_i to compute 
provides us with an efficient stopping criterion for the computation of 
seqience {5i,...,5 n «}. 

p icted value of n is usually small. 



The 
as we will 
dimensional 
approach 
the 
problem (3) 
strategy, 

(i) We can 
bou 

(ii) The 
with the 
S j 
the 



(iii) The ex 
We will 



Let 



(Note that P 
changes sign) 



now describe the algorithm, and analyze each one of these facts. 



1. Search Sp ice for the Optimal Boundaries. 



/ > A/ = {M 1 ,M 2 ,...} = 
= 4': G{j - 1) < G(j) > G{j + 1), with G{j - 1) ^ G(j + 1)} 

p m = {mi,m 2 ,...} = 
'': G(j - 1) > G(j) < G(j + 1), with G(j - 1) ^ G(j + 1)} 



(10) 



(11) 



(Convention* lly we include j = 1 in P M , if < G(l) > G(2), and include it in P n 
if > G(l) < G{2)). We define the set P as 



P = PM{JPm = {Pl,- ..,^r} 



:orresponds to the set of places where the sequence {$k {9j) - ^kAgj)} 
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In what) 
'minima", 



a id 



Let S„. 



n - 



(£„*_) denote the subsets of S n > formed by those boundaries Ly 
whose corresponding term G(Lj) has positive (negative) coefficient in £/*., i.e., if 



then, 



With th :se definitions, we have: 



b^ 



and let /* b ; 



S n ' + — P n 



or P k+1 e P n 

than S n * (we 
then either 



ge: 



and so, we 
decrease sim 
or Lj e (P r , 



follows, we will call the elements of P M% P m and P % the "maxima", 
"critical points" of G, respectively. 



Sn + = {Ll + k> I<l+kt • • •} 



(12) 



Theorem 1: ! iuppose that Q ko {gj) - $jfc,(gy) 7^ 0, for all j (a situation that will occur 
with probability 1 for most observation models). Then, S n - + C P m and S n *_ C P M . 

To see \|hy this is true, let f ML denote the maximum likelihood estimate for 
/ obtained 



the optimal estimate. Suppose that for some j we have, say, Lj e 
Siippose Lj e (F fcl P*+i). for some P k , P k+1 e P. Clearly, either P k £ P m 
. Suppose, for definiteness that P k e P m . 



fr L =\ ku 

lo, 



otherwise 



p n ., the configuration {L u . . .Lj- U P k ,Lj+ lf . . .L n -} has lower energy 
decrease U without altering n), which is a contradiction. If P k e S n % 

f\(Pk,Lj))^f ML ((P k ,Lj)) 
or f*((Lj,P M ))^f ML ((Lj,P k+l )) 



a lower energy configuration by deleting Lj and either P k or P fc+1 (we 
dltaneously n and J7). A similar argument can be used if Ly e [1,P\) 



/ r ].| 
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This relult means that we can use P to constrain the search space for the 
boundaries of each subproblem (i.e., for each fixed n), which now becomes: 



th it 



Note 
will coincide 



For n <f \P\ fixed find S n = {L i} . . .L n } with 

S n+ C P m andS n -. C P M 



such that U( n, S n ) < U(n, L n ) for all L n C P. 



2. Dynamic *rogramming (DP) Algorithm. 



>ptim ll 



From 
of the opti 
the optimiza 

For s(? ] 



equations (7) and (8), it is clear that, for any fixed n, the determination 
(constrained) configurations s!P, S$ is equivalent to the solution of 
:ion problems: 



with Li, 
For S { n ] 

with Li, 



(13) 



theorem 1 guarantees that the constrained and unconstrained solutions 
only for n = n*, so that for n^n \S n may, in general, be suboptimal. 



Minimize [G{L{) - Gfo) + . . .] 
£3, • • • € P m , and L 2 , L 4 , . . . € P M . 

Maximize [G{L{) - G{U) + . . .] 
L3, . . . G Pj« t and L2, L 4 , . . . G P m . 



Let us c Dnsider the maximization problems. Assume, for definiteness that the 
first critical qoint of G is a maximum, i.e., Mi <m u and define the sequences: 

Z>i(it) = supG(M,) 

*>ib 

Lj(A:)=:{minL : G(M L ) = D X (A;)}, fc = l...|P M | (14) 

Clearly, M Ll(1 ) is the optimal location of the boundary for n = 1 (i.e., 
Sp) = {M Ll !)}), and from L>i(l) we can easily compute the corresponding energy. 
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We now def ne, for j > 1: 



and 



for k = l, 



and the optii lal energy is: 



For n even, ve define: 



D 



and get: 



D 2j (k) = sup{D 2y _ 1 (z- + 1) - G(mi)} 

i>k 



D 2j+l (k) = sup{D 2j (i) + G(Mi)} 

%>k 



L 2i {k) = {mmL : D 2j {k) = D 2j -i(L + 1) - G(m L )} 
L 2j+l {k) = {mmL : D 2j+1 (k) = D 2j {L) + G{M L )} 

> \Pm\ - h One can check that, for n odd, 

S { n ] = {M Ln(1) ,m Ln _ l(An(1)) , . . ., M Ll(M ... (Ln(1)M } 



a 



^iW = n+-[-D n (l) + X:*fafe)] 



Z>'i(*) = sup{-GK)} > fc = l,...,|/> m | 



t>* 



L' 1 (/c) = {minL : Z>i(ib) = -G(m L )} 
\ j {k) = sup{D' 2j _ 1 {i) + G(M i )} , * = l,...|/> w |-y + l 

»>Jb 

ii y (fc) = {mini : D'^k) = D' 2i _ l {L) + G{M L )} 
D' 2j+l SMv{D' 2j (i + l)-G( mi )} , k = l,...\P m \-j 

i>k 

^2j+l( k ) = ( mi n i = #2>+i( fc ) = ^2j - G(m L )} 

S { n ] = {^(i),.. •™L 1 (L;(....L; v (1))...)} 

0i(") = n + I [-Bl(l) - G(/V) + £ * to ( 9y )] 
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(15) 



(16) 



(17) 



(18) 

(19) 
(20) 



For the minim 
that Mi < 77 



and for ; > 



l^k) = {min / : d { {k) = -G(m/)> 

d 2j (k) = mf {rf 2 y-i (0 + C(M,)} 

/ 2y (A:) = {min/ : </ 2 y(A;) = rf 2 y_i(0 + G{M t )} 

d2 3 +i{k) = mf {d v (t + 1) - G(m t -)} 

hj+i{k) = {min / : d 2j+l (k) = d 2j {l + 1) - <S(m/)} 

with k varyiifg in the appropriate range. The solutions are: 



For n odd: 



The 
allow us to 



ization problems, that is, for the computation of S$\ assuming again 
i, we have, for n even: 



dt{k) = inf ;{-G(m,-)} 



a 



t/ (n) = n + -[d„(l) + £$*„(?,)] 



<*,(*) = inf {G(M,)} 

4y(*) = ,h£{4;-i(* + x ) - G K)> 
4y+i(*0 = inf {4,(i) +G(M,)} 



with the corresponding definitions for /}(&). The solutions are: 

4 0) = {M /n(0 , . . ., m /1iMU1)M } 



a, .» 



Ob(n) = n + -[4(1) - G(/V) + £ **„(<,,)] 

^ y 

The case for ■ vhich mi < Mi is treated in a similar way. 



(21) 



(22) 



(23) 



(24) 



recu rsions 



(15), (18), (21) and (23), together with equations (9) and (10), 
cjtmpute the sequences {S lf S 2) ...} and {U* V U* 2 ,...} using only one 
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dimensional 
value n for 



optimizations. We now turn to the problem of determining the optimal 
the number of boundaries. 



3. Stopping < 



res j It 



This 
programmin 
minima for 
we can termtiate 



In this |ection we prove the following: 

Theorem 2. ! luppose that every (constrained) optimal configuration in the sequence 
{5i,5 2 ,...} s unique (i.e., for every n, if S' n ^ 5„, and S' n C P, then U(n,S' n ) > 
U* n ) and thai for some n, U* n+2 > U* n . Then, U* n + 2k > U* n , for all k > 1. 



will provide us with an efficient stopping criterion for the dynamic 
recursions described in the previous section; since the first local 
he subsequences {U* l} U* 3) ...} and {U* 2 ,U\,...} are the global ones, 
the computations once we have found them. 



To prov| the theorem, we will need the following lemmas: 

Lemma 1. let S k = {L lt ...,L k } and S k+2 = {L[, . . .L' k+2 } be the optimal 
boundaries ( vith corresponding configurations f k and f k+2 ) for n = k and n = 
k + 2, respertively. Suppose that k + 2 < \P\. Then, S k C 5^+2 (i.e., 5^+2 is a 
refinement or S k \ provided S k is unique. 



Proof: 

We will 
We consider 



riterion. 



assume that for some j, Lj eS k - S k + 2 , and arrive at a contradiction, 
three cases: 



Case 1: Suprjose that for some i, 

M+i]n* = * 
In this case, |^e claim that we can find some index p such that 
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and 



Suppost that this is not the case. Then, Lj-,LJ +1 are the only elements of S k+2 
in some intjrval (L ; ,L J+1 ) (or in one of the extreme intervals [l,Li),(L k ,N]) and 



Suppos \ 



By conditior 
and minimum 
a configurat 
by moving 
similar 



(13), we have that Lj ^ L' i _ l (otherwise, Ly would be a local maximum 

of G at the same time). But then, since S k is optimal, we can find 

on with k + 2 boundaries whose energy is lower than that of S k+2 , 

• { to Lj (or L' i+l to L J+ i), which contradicts the optimality of S k+2 . A 

argument holds if 



This proves pur claim. 
So, suppose that 

and 

Form 



bi 



and let f k 
A(l) (and th 

Let AU 



a +2 ((l p ,v m ])^/,((l;,l; +1 ]) 



W(44i])^/#U.>i]) 



[Li> A"+i] ^ {Lj,Lj +l ) 



[li,£ i+1 ] C [1, Li) or (L k ,N] 



[^^+iin*=i 



/ fc+2 ((L;,L; +1 ])^/ fc ((L;,L; +1 ]). 



£* — {-^lf •)^p-l>^ / p+2»"->^'Jfc+2} 



the corresponding configuration, chosen in such a way that f k (l) 
before, f k {[L p , L p+l ]) = } k {[U p ,L p+l })). 

be the change in U (see eq. (4)) associated with setting: 

/([l;,l; +1 ]) = /, +2 ([l;,l; +1 ]). 



109 



We hav< i that 



Now, we pul : 



Since S k is o 3timal, we have that: 



which contra iicts the optimality of S*+2- 



Case 2: 



Suppose that 



Otherwise, if 
are in case 1 
configuration 

So, 



and let f k be 



U(S k+2 ) = U{S' k ) + AU. 



Sk+2 — v^i> • • •> Lj> L p) L p+l) . . ., L k }. 



•* . _ » 



U{S M ) = U(S k ) 4- AC/ > U(S k ) + AU = U(S k+2 ), 



L\ e [1, L\). We must have 

fk + 2([l,L\))^f k ([l,L\)) 

L\ = L 2 » condition (13) generates a contradiction; if L\ > L 2 , we 
and if Li < L 2 , S k+2 is not optimal, since we get a lower energy 
by moving L\ to L\. 



a +2 ([i,l;d^ /,([i, l;d. 

By a sim|lar argument, we get that 

A+2([^ + 2iAn)^/*([4+2,^). 

Now, proceeding as in case 1, we form: 

S k = {L 2 , ...,L jfc+1 } 



the corresponding configuration, chosen in such a way that f' k (l) = 
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Let AU 



so that 



Now, we for n: 



Case 3: 



To make (*) 
closed interviUs 
hold, we canttiot 
so, our proof! 



Since S k is optimal, we have that: 

U(S k+2 ) = U(Sl) + AU> U(S k ) + AU = U(S k + 2 ), 
which again contradicts the optimality of S k+2 . 



be the change in U associated with setting: 
m\,L 2 ]) = fk + 2(lL\,L 2 }) and 

f([ L k+l> L k+2\) = fk+2{[L k +l,L k+2 \) 



U(S k+2 ) = U(S k ) + AU. 



Sk+2 — \Li j L\ , . . ., L k) L k+2 }, 



For all i, [liX+ilClSk^b 
and ([l,L[]{j[L k+2) N])f)S k ^^ 



(*) 



hold, we must be able to place k boundaries in A; + 3 (ovelapping) 
, without omitting any interval. Moreover, since condition (13) must 
put Lj = L\ and L J+1 = L' i+2 for any i,j. But this is impossible; 
is finished, i 



Lemma 2. l*\AU k = U{S k )-U(S k+2 ). Then, AU k < AU k . 2 , for all k e [3, \P\-2], 

Proof: 

Considei the optimal configurations S k , S k + 2 , Sjb+4, and suppose that AU k+2 > 
AU k . Using 1 ;mma 1, let 

Sk = {Li,...,L k }', 



' _ » _ » 



Sfc+2 — {L\, ■ ..,L lt L 2 , . . .,L k }. 



Ill 



By 
consider eaclk 



condition (13) and lemma 1, there are only two valid forms for S k+i . We 
case separately: 



Case 1: S k+4 



(i.e., the refinements corresponding to S k+2 and S k+4 are disjoint). 
Then, 



fcr 



we have 



which is a cc ntradiction 



Case 2: S k + 4 



(i.e., Sjt+4 is 
Let 



We have thai , 



By assumptio i, 



U(S' k+2 ) = U(S k ) - AU k+2 < U(S k ) - AU k = U(S k+2 ), 



is of the form: 



Sk+4 — {Ll, • • *)L p) 1*1,1*2, Lp+\. . .,Li,Ij2, . . .} 



Ojfc+2 = {Ll, • • -jL p ,L p+ l,. • -,L 1 ,L 2 ,. . .}, 



is of the form: 



> _ » _ » _ » 



Sfc+4 — {Lit ' • >}Lj,Li,Li,L2,L2, • ..} 



i subrefinement of the refinement introduced by Sjb+ 2 ). 



a = 



6 = 



c = 



-U({L lf . . ., Lj, L\,Li,L j+1 , . . .} + U(S k ) 
U{{L U . . , Ly, L'l L 2 , Ly+i, . . .} - U{S k ) 
-Cr({L 1 ,...,Ly,L 2 ',L 2 ,Ly +1 ,...} + C/(5 j fc) 

AU k = a + c-b 
AUk+2 = b. 

b > a + c — 6 
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and therefor j 



Now, let S^ 2 be formed from S k by the refinement: 

{Li f Li, ifa = max(a,c) 
L 2 , L2, if c = max(a, c) 



Then, 



* r (S*+2) = U{S k ) - max(a, c) < U{S k ) - AU k = ^(Sjk+a), 



which is a contradiction. 



Now we 



Suppose 



4. Expected 1 alue of n\ 



First, we 
n of odd bonBs 



as: 



Al/fc = o -f c — b < < max(a, c). 



prove theorem 2: 
u l+2 > u l> Then, 



* + 2 + |#(S fc+2 )> k+^U(S k ) 



now, by lemi ia 2 we have: 



t/f +4 = k + 4 + |#(S t+4 ) = k + 4 + |(#(S fc ) - A& fc+2 ) > 

> k + 2 + |(#(S fc ) - Afrfc+a) > A: + 2 + |(#(S fc ) - A&*) = 
= k + 2 + ^U(S k+2 ) = U* M 1 



compute the (prior) probability density function p(n) for the number 
in the original field /. 



Let N b = = N - 1 be the total number of bonds. We can rewrite equation (1) 



P{ u = /) = i e i(*-a«) 



(25) 
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The tots I number of configurations compatible with a given n is 2C£ r \ and so, 

= 2C^cxp[l(iV t -2n)) = 
E£i„C7^cxp[l(iV t -2fc)] 



which is a bi lomial distribution. Therefore, 



that 



We note 
exponentially 
too high, we 



= c *( * 1/a ) Nb ~ n ( <~ 1/a ) n 



(26) 



E[ n ] = N b ( — - ) 

Var[n] = N b ( - ) 



(27) 



as a | oo, E[n] t N b /2, and as a j 0, E[n] | (and var[n] I 0) 
fast. This means that if the natural temperature of the system is not 
can expect that n*, the MAP estimate for n, to be relatively small. 



5. Relation t< Multiscale Filtering. 

An intei esting characteristic of the DP formulation is that the solutions to 
each of the : ubproblems (which in fact correspond to a minimization of U (eq. 
(4)) are inde >endent of the value of the parameter a. The role of this parameter 
is to determ ne the number of regions (n*) that will be present in the optimal 
configuration In this sense, it can be regarded as a "scale" parameter that controls 
the aggregate >n of the subregions into larger units, and the algorithm can be used to 
produce muli iscale descriptions (in the style of the "fingerprints" treated in Witkin 
(1983) and \ uille and Poggio (1983)) of the input signals. (Several other heuristic 
solutions to t lis problem have been proposed. See, for example, Blumenthal et al., 
1977; Prazdn r 9 1982 and Pavlidis, 1973) 

If we ini erpret the algorithm in this way, it becomes natural to ask whether 
a family of li near operators can do the same job in a much efficient way. Let us 
formulate thi ; question in more precise form (in what follows, we will consider a 
"continuous 1 ime" problem obtained from the original one as a limit when N | oo 
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(provided th 
it simplifies 
discrete case] 



lie 



Considei a family of filters {F L } with the following properties: 

x) is a symmetric and non-negative function of x. 

L, F L (x) is a decreasing function of |x|, and F L (x) j as |x| | oo 
gh, so that F L can be approximated by a function with finite 



(i) Each F L 

(ii) For eacl 
fast enol 
support. 

(iii) All the Alters are normalized: 



(iv) The filters become sharper as L { 0: 

>6 rb 



Particula r 
©The 



fam ly 



Suppose 
approximation 
If we start wi h 



t the observations are different from only in a finite interval), since 
notation. It should be clear that the same arguments apply to the 



/oo 
Fi(x)dx = 1, for all L. 
-oo 



f Q F L2 (x)dx < f Q F Ll (x)dx 
implies that l^ > L\ 



examples of acceptable families are: 
of rectangular boxes B^: 



B L (x) = {U- if 1=1 <. L 
10, otherwise 



(ii) The fam ly of Gaussian Kernels: 



G L {x) = 



v/2tFL 



eXp '-2Z^] 



we convolve the function g(x) - \ (g(x) is a continuous time 
to the observations) with a set of filters from the family {F L }. 
L large enough, the function 



h L = {9-J * F L 
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this 



will be practif 
zero crossing: 
associate a boundary 
respectively, 
ignoring, at 
With additional 
them by decfeasing 
each zero 
the moment, 
correspond 



ally constant, and therefore, it will have no zeroes. As we decrease L, 

of h L will begin to appear. To each of these zero crossings, we will 

, and form the configurations Si, S 2 , • • • with 1,2,... boundaries 

i|hat correspond to the first, first two, etc. zero crossings of h L (we are 

is point, the question of the precise localization of these boundaries. 

contraints on the family {FyJ, it is possible, in principle, to localize 

L in a continuous fashion, and then tracing the position of 

crossing to the finest (L = 0) level; see Yuille and Poggio (1983). For 

let us assume that we can identify the zero crossings of g - \ that 

those of h L , for all L). 



to 



The que; tion that we ask is the following: 



If S U S 2 
algorithm ,is i 



u 



for all kl 



Consider 



Ul' 



and g(x) = 
such a way 
property (ii), 
4.B.1). 

Suppose 
single double 



. .. are the optimal boundary configurations produced by the DP 
true that 

A 

Sk = Sk 



As we nc w show, this is not the case. 



the signal g(x) defined by: 

g{x) = l , 

for x e [l u h + 2a] \J[l 2 , h + 26] [)[l 2 + 46, l 2 + 66] |J 

+ 86, l 2 + 106] (J[l 2 + 126, h + 146] Ufa + 166, l 2 + 186] , 



otherwise. Here, lul 2 ,a and 6 are some positive numbers chosen in 

tl at, if Lq is the starting L, we take l 2 - h - a > > Lo, so that, by 

i here is no interaction between [l lt l x + a] and [l 2 , l 2 + 186] (see figure 

hat the zero crossings corresponding to [/i,/i + a] appear first (as a 
zero) at L = Lu and those corresponding to [l 2) l 2 + 186] at L = 1^. 
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Figure 4.B.I. (Sec text). 



Then, 



j* F Ll {x)dx = j~ F Ll {x)dx 
jf F L ,{x)dx + J u F L ,(x)dx + J n F Lt [x)dx = 

/*36 y-76 i«oo 

= A F ^ x ) dx + ] 5b fi n (x)dx + J F L2 (x)dx 



(28) 



(29) 



Now, for a > 6, we have: 

#({*!,«) -106 

U{{k,U}) = &b + 2a>U{{l 1) l 2 }) 

and therefore, S 2 = {h,h}. 

We claim that we can find some a, b with a > b such that 

fj F L2 (x)dx < f^° F L2 (x)dx 

If this is true, we find, using (28) and conditions (iii) and (iv), that it implies that 
Li > L u and therefore, S 2 = {/3,/4>. 
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We now prove our claim: 
Let a = b + §, where we choose e so that 

•6+e/2 z-56 



/ 6 F L2 (x)dx = jf F Ij2 (x)dx (30) 



(property (ii) guarantees that we can find such e). From (29), 

/•oo /»6 rib rib 

y 6 */*(*)«** = y o ^ 2 (x)dx + 2 y 36 f^*^* + 2 y ?6 ^u^x 

and from (30), 

r° ^ w^ - C/2 f ^ x ^ x = r F ^ x)dx - t t/2 F <«w dx = 

•M-e/2 rib rb+t/2 



/•o ro-\-t/£ tvo fb+e/2 rib 

= /, FL 2 {*)d*+} b F L2 (x)dx+2 J n F L2 (x)dx = J q F L2 (x)dx+2 J n F Lt {x)dx> 

rib pa 

> J n F L2 (x)dx = ^ Fx, 2 (x)dx | 



This result does not mean, of course, that families of linear filters cannot be 
used for producing useful multiscale descriptions of signals; it only means that these 
descriptions cannot, in general, be considered as MAP estimates of MRF models. 



6. Continuous Valued Fields. 

In this section we present a related problem which can, in principle, be solved 
using the DP approach, although, as we will see, in a less efficient way. 

Let us consider the problem of estimating a piecewise constant signal corrupted 
by additive white Gaussian noise. We model the signal {/,} as a MRF with potential 



V{fi,f i+ i) = < u . 31 

U, otherwise 
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and global states distributed according to: 

P{F = f) = \ exp[-i £ V{fi, I M )\ 

The observations are given by: 

9i = /» + rii 

where n is a white Gaussian process. The Bayesian (MAP) estimate for / is again 
found by minimizing eq.(4): 

U(f) = n +lu 

where n is the number of places where /,- ^ f i+u and 7 = ^2 • Note that in this 
case, /,- is not restricted to {0, 1}, but can take any real value. 

Proceeding as we did in section 2, we consider the sequence of subproblems 
obtained by putting n = 0, 1, 2, . . .. 

For any fixed n, U will depend only on the n integer variables that correspond 
to the location of the boundaries between regions of constant /, since given these 
boundaries L = {Xi,...L n }, the optimal estimate for / on any interval (A-,A'+i] 
(we put Lq = 1 and L n+ i = N) is: 

1 £<+i 

If we define G^i (for k < /) as: 

G k ,i = (1 - 2(/ - *))( j4* ._£ *) ( 32 ) 

We get that: 

#(*.») = E ?? + E <V„ iy (33) 
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(note that £ gf is a constant for a given set of observations). Using dynamic 
programming principles, we can now write the recursions: 

F (k) = G ktN , k = 0,...N-l 

F j+1 (k)=M{G kti + F 3 {i)} , k = Q,...N-j-l 

L J+l (k) = {L : G kiL ^Fj(L) = F j+l (k)} (34) 

The optimal solution, for each given n is: 

s n = (Mo), l»-i(z»(o)), • • -, M^2(- • -(Mo))- • •)} 

and the corresponding energy, 

C/(n,S n ) = n+f[£g? + F n (0)] (35) 

The solution to our problem will be S n * , where: 

C/(n*,5 n .) = inf{C/(n,5 n )} (36) 

Unfortunately, in this case we cannot guarantee the unimodality of any subsequence 
of {U(S n )} (although we believe that the sequence will be unimodal in many cases) 
and so, (36) has to be computed, in principle, by an exhaustive one dimensional 
search. Another unpleasantness is that, unlike the binary case, the search space for 
the variables U cannot be reduced in any obvious way. 
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Appendix 4.C 
CONSISTENCY CONDITION FOR THE MPM ESTIMATOR 



In this appendix we present a proof for the consistency condition (given 
by equation (11) of chapter 4) satisfied by the MPM estimator of a binary, 
two-dimensional Ising net: 

Theorem: Let P(f, g) be the posterior distribution corresponding to the estimation 
of the first order, binary MRF / from the observations g which are obtained as the 
ouput of a binary symmetric channel: 

P(f, 9) = \ exp[- £ V{f it /y) - 7 EU - '(A - 9i))] 

•■•>■ • 

Let / be the MPM estimator for /. Then, for every site »*, 

E P(f,9)> E Hf.g) 



implies that: 



E P(fJ)> E PV.ft 



Proof: 
Let: 



U ~Ul. y-i 

(,) _ (a. i ¥> i 

' I/,-, y = i 

1 u-/„ y = i 
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1) We first prove that for all i: 



implies that: 



E^(/ (t 'U)>E^ (t 'U) 



E^(/ (0 ^ W )>E^ W ^ W ) : 

/ / 



Suppose that g ^ g® (otherwise, the above is obviously true). For any fixed /, we 
have that: 

P{f {t \g)-P(h®,g) = 

= K{exp[- £ V&, fj) - 7] - exp[ £ V(l - /,, fj )}} 

JEN* JEN, 

and 

P(f®,9®)-P{h®,g®) = 

= K{exp[- £ V(f if ft] - exp[ £ V(l - f it /,) - 7 ]} 
jeNi jeNi 

Where A" is a constant. Since 7 > 0, this implies that: 

P(/ {0 , 9) - P(h®, g) < P(/W, ,«) _ P{h «) } g{i)) 

so that 

E P{f®, 9 {{) ) - P(A« g(i)) > £ F(/(0, ,) _ p (A (0 f g) > 

2) Let r t = 1 - / t .. We now prove that if: 

E P(f,9)> E *U*) 



then, 



for all ;. 



E PV,9®)> E JU* W ) 
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For i = ;', part (1) applies, and for gW = g, the assertion is obviously true, so 
suppose i 7^ ;* and gW ^ g. We have: 

P(f®,g)-P(h® t g) = 
= A^exph E V(f { , / y )] - exp[ £ V(l - /,, /,-) - 7 ]} 

P(/(0,^))_P(/ l (0 >J7 (y) ) = Xl{exp[ _ £ k(/., /,.)]_ 

je^v.- 

exp[ £ F(l - / t , /y) - 7]} exp[- 7 (l - 2(/y - g y )) 2 ] > 

for some constant K it so that 

E ^(/ W . I7 (y) ) - P(hM, y W) > e"^ E P(/ W , g) ~ P(* w , y) > 



The theorem is now proved by assuming that 

£ P(f,g)> E Plf,g) 

and succesively replacing 9i by /,-, for i = 1, 2, . . .and using (1) and (2) to show that 
the corresponding inequalities hold at each step. 
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Chapter 5 
RECONSTRUCTION OF PIECEWISE CONTINUOUS FUNCTIONS 



1. Introduction. 

In this chapter we will illustrate the application of local spatial interaction 
models and estimation techniques that we have described to the solution of the 
general reconstruction problem that we introduced in chapter 1. To make this 
discussion more specific, we will consider a particular instance of this problem: the 
reconstruction of piecewise continuous functions from noisy observations taken at 
sparse locations. 

In this reconstruction, it will be important not only to interpolate smooth 
patches over uniform regions, but to locate and preserve the discontinuities that 
bound these regions, since very often they are the most important parts of the 
function. They may represent object boundaries in vision problems (such as image 
segmentation; depth from stereo; shape from shading; structure from motion, etc.); 
geological faults in geophysical information processing, etc. 

The most successful approaches to this problem (see Terzopoulos (1984)) consist 
of, first, interpolating an everywhere smooth function over the whole domain; then, 
applying some kind of discontinuity detector (followed by a thresholding operation) 
to try to find the significant boundaries, and finally, to re-interpolate smooth patches 
over the continuous subregions. 

The results that have been obtained with this technique, however, are not 
completely satisfactory. The main problem is that the task of the discontinuity detector 
is hindered by the previous smooth interpolation operation. This becomes critical 
when the observations are sparsely located, since in this case, the discontinuities 
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may be smeared in the interpolation phase to such a degree that it may become 
impossible to recover them in the detection phase. 

One way around this difficulty is to perform the boundary detection and 
interpolation tasks at the same time. In the method we will present, this is done 
by using a Bayesian approach, and including in the posterior distribution our prior 
knowledge about the smoothness of the function and about the geometry of the 
discontinuities, as well as the information provided by the observations. Before 
describing how this is done, let us formulate the problem in a more precise way. 

Consider a region n of the plane which is formed by a number of subregions 
separated by boundaries which are known to be piecewise smooth curves. Suppose 
that within each of these subregions, some property / (in what follows, we will refer 
to / as "depth") varies in a smooth fashion, presenting, at the same time, abrupt 
jumps across most of the boundaries. Suppose also that we have measurements for 
the values of/ at some discrete set of sites 5; these measurements will, in general, 
be corrupted by some form of noise. 

Our problem is then to estimate the values of/ on some finite lattice of points 
L C n, and to find the position of the boundaries, using all the available information 
in an optimal way. 

2. Posterior Distribution. 

To apply the general recontruction algorithms developed in chapter 3 to this 
problem, we need to cast it in probabilisdc terms. The main issue here is the 
representation of the concept of "piecewise continuity" in the form of a prior Gibbs 
distribution in a meaningful way. 

This could be done, for example, by modeling the function as a first order, 
continuous valued MRF with nearest neighbor potentials given by: 

Vi - fj)\ if \h - / y | < a and |t - j\ = 1 
V(f it fj) = < 6, if |/,. - fj\ > a and |t - j | = l 

0, otherwise 
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where a and b are positive constants such that 6 > a 2 , and for every pair of 
neighboring sites i t j t |/ t - - fj\ < a if both i and j lie in the same smooth patch, 
and \fi — fj\ > a, otherwise. 

This scheme, however, has the disadvantage of not allowing for the explicit 
modeling of prior knowledge about the geometry of the curves that bound the 
smooth patches (the fact that they should be piecewise smooth curves, for example). 
A more flexible construction involves the use of two coupled MRF models: one to 
represent the function (the "surface") itself, and another to model the curves where 
the field is discontinuous. A coupled model of this kind was first used by Geman 
and Geman (1984) in the context of the restoration of piecewise constant images. 
We will now describe their work in detail, and define a related model that can be 
used for our problem. 

2.1. Coupled Line and Depth Models. 

In Geman and Geman's work, the intensity of the images is modeled using a 
first order MRF with generalized Ising potentials (see chapter 4). The boundaries 
between constant regions are modeled using a "line process" /, which is a MRF 
whose associated random variables are located at the sites of the dual lattice of 
lines that connect the sites of the original intensity lattice (see figure 12). These 
variables may be binary (indicating the presence or absence of a boundary between 
two pixels), or may take more values to indicate the orientation of the boundary as 
well. In both cases, their function is to decouple adjacent pixels, reducing the total 
energy if the intensities of these pixels are different 

This is done by modifying the prior energy function; the new expression is: 

uoif, i) = E E VsUi, fv Us) + E VcM (1) 

t jew a 



where 



w,/,y=p ( , )/y)> 



in t7 is"on" 
otherwise 
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Figure 12. Dual lattice of line elements (sites denoted by x) 
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Figure 13. (a) Cliques for the line process used by Geman and Gcman. (b) Additional cliques 
used to prevent sharp turns. 



V is defined in equation (1) of chapter 4: 
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-1, if |t - y| = 1 and /,- = / y 

V(/,-,/y) = a, if |t-i| = land/ t -^/y 

0, otherwise 

kj is the line element between sites i and ;, and the line potentials V Cl have as 
supports cliques of size 4, such as the one shown in Fig. 13-a. Every line element 
(except at the boundaries of the lattice) belongs to 2 such cliques. The values of 
the potentials associated with each possible configuration of lines within a clique 
must be specified. Thus, for example, if straight horizontal and vertical boundaries 
are likely to be present, a binary process, with potential values as those of Fig. 
14 is used (rotational in variance is assumed). In more general situations (such as 
piecewise smooth boundaries), we may use different values for the potentials, or we 
may allow more states for the line elements, corresponding to different orientations, 
augmenting consequently the table of values for the potentials. 

2.2. Models for Piecewise Continuous Functions. 

The model we have described can be adapted to our problem by modifying 
the choice of the potentials and the neighborhood structure of the coupled MRF's. 
Specifically, the following modifications are needed: 

1. Since in our case the observations are sparse, it becomes necessary to expand 
the size of the neighborhoods of the line field, to prevent the formation of "thick" 
boundaries between the smooth patches (i.e., adjacent, parallel segments of active 
lines in these regions). In particular, we propose that the dual lattice be 8-connected, 
with non-zero potentials for the cliques of the form illustrated in figure 13 (a) 
and (b). The inclusion of the cliques of figure 13-b has the additional advantage 
of penalizing the occurance of sharp turns, permitting us to model the formation 
of piecewise smooth boundaries (a more general case) using a binary line process 
instead of the 4~valued process proposed by Geman and Geman. The potentials 
for these cliques are computed in the following way: 

Let V a , V b denote the potentials associated with the cliques C a , C b of figure 13 
(a) and (b), respectively, and let S k (k e {a, b}) denote the number of line elements 
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v* 3 = 1.2 v 4 = 2.0 v 5 = 2.0 

Figure 14. Potentials for the different configurations of a line process 



belonging to C k that are "on" at a given time, i.e., 



S* = £ /,- , k = a, b . 
The potentials V k are given by: 

V k = /3<j> k (S k ) , k = a,b (2) 

where /3 is a constant, and the functions </> k are defined by the following tables: 
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S a 1 2 3 


4 




<f> a 0.4 0.25 1.2 


2.0 




S b 1 2 




* 


<fo o 10 







It is not difficult to see that this choice of potentials (notice that V a will be 
slightly different from the definition of figure 14) will effectively discourage both 
the formation of thick boundaries (St, = 2) and the presence of sharp turns (S a = 3 
and/or S b = 2). 

2. The potentials of the depth process, which is now continuous-valued, have to be 
modified to express the more relaxed condition of piecewise continuity (instead of 
piecewise constancy). Specifically, we propose: 



V { f i ,f J ;l ii ) = l ifi U 
10, 



- /y) 2 (l - <o). for |« - j\ = l 
otherwise 



(note that l {j e {0, 1}) 



3. Unlike the case of piecewise constant surfaces, we now have to worry about the 
maximum absolute difference in the values of two adjacent depth sites that we are 
willing to consider as a "smooth" gradient (and not a discontinuity). This value, 
which in general is problem-dependent, determines the magnitude of the constant 
P in equation (2), which can be interpreted as the coupling strength between the 
two processes. 

2.3. Model for the Observations. 

We will adopt the general model described in section 2.1 of chapter 3 to 
represent the observation process. In particular, to make the discussion more 
specific, we will assume that the observations g correspond to samples of the surface 
/ taken at a set 5 C L of sparse locations, corrupted by a zero mean, white, additive 
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Gaussian noise process: 

9i = fi + "i , 

so that the conditional distribution is: 

Pf\M> /) = II "T^r «p[-(/,- - 9i) 2 /2a 2 } 

ieS y/27T cr 

our results, however, can be extended to handle other noise models as well. 
Using Bayes' rule, we can finally write the posterior distribution as: 

Pf,l\ g {f, l '>9)=-^ exp[-U F (f, I; g)] 
with 

+ A £(/; - 9if + £ v a (i) + £ W) (4) 

ZG ies c a c b 

V a and Vi are the potentials corresponding to the V and "6" type cliques of the 
line process, and are defined by equation (2). It is convenient to introduce a function 
q which is equal to 1 only at those sites where there is an observation, and is equal 
to zero elsewhere (i.e., q is an indicator function of the set 5): 

ifieS 
otherwise 



J 1 - 

lo, 



(5) 



Using this function, and the definition of V from equation (3) we get: 



t- 53 E( A - ») J w + E vj® + E v h {i) (6) 

za ieL c a c b 
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3. Optimality Criterion. 

We can now apply the general principles developed in chapter 3 to derive the 
optimal Bayesian estimators for the depth and line fields. As a performance criterion 
we will use a mixed cost functional of the form: 

«m(/, ',/,<)= E (A' - hf + E (1 - % - */» ( 7 ) 

ieLf jeu 

where L/,L/ denote the depth and line lattices, respectively. This error criterion 
means that the reconstructed surface should be as close as possible to the true 
(unknown) surface, and that we should commit as few errors as possible in the 
assertions about the presence or absence of discontinuities. 

Appllying the results of section 5 of chapter 3, we find that the optimal 
estimators will be the posterior mean for / and the maximizer of the posterior 
marginals for /. Note that these estimates must be computed by averaging over all 
possible values of both / and /: 

4. Monte Carlo Algorithm. 

There is one serious difficulty that prevents us from applying directly the 
general Monte Carlo procedure that was derived in chapter 3 to the computation 
of these optimal estimates: since the depth variables are continuous-valued, if we 
discretize them finely enough to guarantee sufficient precision of the results, the 
computational complexity of either the Metropolis ar Gibbs Sampler algorithms 
will be very large. One way around this difficulty is to note that for any fixed 
configuration of the line field, the posterior energy becomes a non-negative definite 
quadratic form: 

U{f\l,g)= E Ui-fi? + «Y.Ui-9i? + K (8) 

i,j:lii=0 jes 
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where a and K are constants (note that the first sum is taken only over those pairs of 
sites whose connecting line element is "off", and the second one over the set S). This 
means that the posterior distribution of the depth field is conditionally Gaussian, 
so that we can find the optimal conditional estimator f*{l) as the minimizer of (8) 
(for a Gaussian distribution, the posterior mean and the MAP estimate coincide). 
If/ is identically zero (no lines), this function is strictly convex, and therefore it has 
a unique minimum. Let f* be the corresponding global minimizer. For any fixed 
configuration /, the gradient of (8) is given by: 

^^ = 2 £ (/,- - / y )4y + 2a ft (/ t - - *) (9) 



where 



N t - = {y : |t-j| = l} ; 



C t j J. t,j 



Setting this gradient equal to 0, we find that any minimizer of U will be a fixed 
point of the system: 

AM) Zie Ni lijA k) + <*gygj _ 

and ff +1) = ff otherwise (10) 

We note that the updating scheme (10) will produce a decrease in the value of 
U(f | /), regardless of the sweeping strategy. In a synchronous scheme (where all 
the sites are updated at the same time), the energy increment will be: 

AU(f | = C/(/( fc+1 ) 1 1) - U(fW 1 1) = 
= -2E(EAy + «ft)(/i* +1) -/{ fc) ) a - 

-E^^(/! t, -/! t+i, x/f-/r ) )<<' 
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because U is non-negative definite. For an asynchronous strategy, where /(* +l ) is 
obtained from fW by updating only the site i, we get: 

AU(f | = -4( £ Uj + «*)(/? +,) " f? ] ) 2 < 
jeNi 

Therefore, if we set 

/ (o) = /; (ii) 

the dynamical system defined by (10) and (11) (with a given sweeping strategy) will 
be stable and have a unique fixed point f\. 

Note that, since U(f | I) is always convex, f] will be a global minimizer (see 
Luenberger (1973)), but in general it will not be the only one; there may be cases 
in which some region Q within which there are no observations is isolated from the 
rest of the lattice by the line process. In this case, any solution for which 

fj = constant , j £ Q 

will also minimize U(f | /). However, for a fixed initial state /(°) the deterministic 
dynamical system (10) will always converge to the same solution, so that the 
configuration /*(/) is well defined. 

Let us define the set F* as: 

F' = {(f,l) ■■ / = /,*} 

It is clear that, if /, 1 are the optimal estimates for our problem, we have that: 

{f,i)eF 

which suggests that we can constrain the search for the optimal estimators to this set. 
This can be done, in principle, by replacing the posterior energy with the function: 

U\l) = U(fll) 



134 



(which depends only on /), and use the standard Monte Carlo procedures to find 
the optimal estimator /. To illustrate this idea, let us consider the following physical 
model: 

It is a well known fact the the steady state of an electrical network that contains 
only (current or voltage) sources and linear resistors will be the global minimizer of 
a quadratic functional that corresponds to the total power dissipated as heat (Oster 
et al, 1971). It is therefore possible to contruct an analog network that will find 
the equilibrium state of the depth field for a given, fixed configuration of the line 
process, i.e., that will minimize the conditional energy (8) (see Poggio and Koch, 
1984). This suggests a hybrid computational scheme in which the line field (whose 
state is updated digitally, using, say, the Metropolis or Gibbs Sampler algorithms) 
acts as a set of switches on the connections between the nodes of the analog network 
whose voltages represent the depth process. In particular, if /,- represents the voltage 
at node i t the hybrid network can be represented as a 4-connected lattice of nodes 
(see figure 15) in which: 

(i) A resistance (of unit magnitude) and a switch (controlled by the line 
element / tJ ) is present in every link between pairs i t j of adjacent nodes. 

(ii) If an observation &• is present at site i, a current of magnitude equal to 
agi is injected to the corresponding node, which must also be connected 
to a common ground via a resistance of magnitude 1/a (see equation 8). 

A direct application of KirchofF current law shows that at each node of this 
network we will have: 

jeNi 

which corresponds to a fixed point of the system (10). In practice, there will always 
be parasitic capacitances which will prevent the instantaneous establishment of the 
equilibrium conditions. However, the time constant of the analog portion of the 
network may be made very fast, so that in fact, the probability distribution of the 
equilibrium states of this network will be Gibbsian with energy U*. 

This scheme can be used, in principle, to construct a special purpose hybrid 
computer for the fast solution of problems of this type. In a digital machine, however, 
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Figure 15. Hybrid network implementing the surface reconstruction algorithm of section 4. 
The voltage at every node represents the height of the surface. Inside every rectangular box there 
is a resistance of unit magnitude and a switch whose state is controlled by the corresponding line 
element (see text). 



the exact implementation of this strategy will in general be computationally very 
expensive, since /J must be computed every time a line site is updated We will 
now present an approximation which has an excellent experimental performance, 
and leads to an efficient implementation. 

First, let us examine one iteration of the, say, Metropolis algorithm at a given 
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temperature T > for the function U*. When a line site is visited and its state is 
updated, the corresponding increment in energy AC// is computed as follows: 

Suppose the line site ij is visited (the line between depth sites i and ;). Let / tJ - 
be its current state, and l i3 the candidate state: 

A 

Hj ==: 1 Hj 

Suppose that the current state of the depth process is 

and let f* be the fixed point of (10) obtained when we replace Uj by 2,-y. Let us 
define: 



and 



Av; y = £ V a (l)-V a (l)+ Y. V h (l)-V h (l) 

C a :lij£C a Cb'.lij£Ci, 



where C a ,C b are the V and "6" type cliques defined in figure 13, and V a ,V bl the 
corresponding potentials. 

Since the depth process is at equilibrium, and we are changing only the element 
lij, we may assume that 

f P ^fp for Py£i,j (12) 

so that 

AU* t « AV-y + 



+ £ 



Cfm ~ f 2 m)[ £ (1 ~ **m) + Ctq m ] ~ 2(/ m - / m )[ £ A(l " km) + <*<7m<7m] 

(13) 



Now, if the absolute difference |/,- - /y| is small, / and / will be practically 
identical; on the other hand, if |/,- - f 3 \ is large, the changes in / at locations i and 
j will be relatively small with respect to this absolute difference. Therefore, we may 
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approximate AU* t by the simple expression: 

AU] » AV {j + (/ y - / t ) 2 (/ t y - 2 t7 ) (14) 

which depends only on the potentials of the cliques to which the updated line 
element belongs, and on the current state of the depth sites adjacent to it. If this 
approximation is to remain valid, the equilibrium condition on / must be mantained. 
This is done by performing M global deterministic iterations using (10) after every 
global stochastic update of the line process. We have found experimentally that the 
use of the approximate expression (14), and only three restoring iterations (M = 3) 
are sufficient to get a good convergence behavior. 

It is also possible to use assumption (12) and the fixed point condition of the 
system (10) to compute a more precise approximation to AU* (the corresponding 
formulae are derived in appendix 5.A). Our experiments indicate, however, that 
the simpler approximation (14) gives sufficiently good results, so that the increased 
complexity incurred by the use of this, more precise scheme does not seem to be 
justified. 

An important issue in the practical implementation of this procedure is the 
determination of the optimal temperature for observing the equilibrium behavior 
of the system. We have found that this can be done effectively in an adaptive way 
by starting the simulation at a relatively large temperature (say, T = 5) and slowly 
decreasing it until the network shows an adequate level of activity (measured, by the 
fraction of sites whose state is modified in one global iteration). We have found that 
a level on the order of 0.1 is adequate in most cases. This technique is similar to the 
Simulated Annealing method for finding the global minim izer of the energy, but 
in that case, the cooling of the system must proceed at a slower rate, and it should 
be continued until the level of activity is reduced practically to 0; if we proceed in 
this way, the final state of the system will correspond (approximately) to the MAP 
estimate. Note that map" 1 map) G F* too, so that the mixed strategy described 
above will also work in this case (see Marroquin, 1984). As we pointed out in the last 
chapter, if the signal to noise ratio is not too low, the configuration corresponding 
to the MAP estimate will be very similar to the optimal one C/pm^mpmX From 



138 



a computational viewpoint, however, the optimal estimator is preferable, since it 
exhibits a faster and more consistent convergence behavior. 

5. Experimental Results. 

We will now present some experimental results that illustrate the performance of 

the optimal Bayesian estimators for surface reconstruction tasks. In these examples, 

we assume that we have the following prior knowledge about the nature of the 

surfaces we are trying to reconstruct: 

(i) The region under consideration can be segmented into a small number of 
subregions. 

(ii) Within each subregion the surface is smooth (the gradient is less than 0.5). 

(iii) The boundaries between regions are piecewise smooth. There are relatively 
few corners. 

(iv) The average height of the discontinuities across boundaries is greater than 
0.8. 

(v) The observations are corrupted by an additive white Gaussian noise process, 
and we have some estimate of its intensity. 

This knowledge is embodied in the model for the line process, and in the 
numerical value of the parameters. For our experiments, we have used a binary 
process with potentials given by equation (2). 

In the first set of experiments, we generated sparse observation points at 200 

random locations of a 30 X 30 rectangular grid. Figures 16, 17, 18 and 19 show 

(with height coded by grey level) the observations (a); the configuration obtained 

by interpolation with no boundaries (b); the final reconstructed surface (c), and the 

boundaries found by the algorithm (d), for: 

(i) A square at height 2.0 over a background at constant height = 1.0 (Fig. 
16). 

(ii) A triangle, with the same characteristics (Fig. 17). 

(iii) A tilted square plane (slope = 0.1) over a constant height background 
with white Gaussian added noise {a = 0.1) (Fig. 18). 

(iv) Three rectangles at different (constant) heights over a uniform background 
(Fig. 19). 
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Figure 16. (a) Observations of a square at height 2.0 over a background at height 1.0 (a white 
pixel means that the observation is absent at that point), (b) Interpolation with no boundaries. 
(c)Rcconstructed surface.(d) Boundaries found by the Algorithm. 
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Figure 17. (a) Observations of a triangle at height 2.0 over a background at height 1.0. (a 
white pixel means that the observation is absent at that point), (b) Interpolation with no boundaries 
(c) Reconstructed surfacc.(d) Boundaries found by the Algorithm. 

In many interesting cases, the observation sites are not randomly distributed, 
but rather tend to be clustered along certain curves. This is the case, for example, of 
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Figure 18. (a) Observations of a tilted square (slope = 0.1) over a background at height 1.0 
with added white Gaussian noise (a = 0.1) (a white pixel means that the observation is absent 
at that point), (b) Interpolation with no boundaries, (c) Boundaries found by the Algorithm, (d) 
Reconstructed surface. 
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Figure 19. (a) Observations of 3 rectangles at heights 2.0, 2.0 and 3.0 over a background at 
height 1.0 (a white pixel means that the observation is absent at that point), (b) Interpolation with 
no boundaries, (c) Reconstru cted surface.(d) Boundaries found by the Algorithm. 

the reconstruction of geological structures from seismic data, or of certain algorithms 
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Figure 20. (a) Observations of a square at height 2.0 over a background at height 1.0 with 
added white Gaussian noise (<r = 0.1). White pixels denote missing observations, (b) Interpolation 
with no boundaries, (c) Boundaries found by the Algorithm, (d) Reconstructed surface, (e) 
Perspective view of (b). (f) Perspective view of (d). 
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Figure 21. (a) Disparity data for a stereo pair of aerial photographs (data kindly provided by 
W.E.L. Grimson). (b) Interpolation with no boundaries, (c) Boundaries found by the Algorithm 
(d) Reconstructed surface. 



143 



for the reconstruction of surfaces from stereoscopic pairs of images, when the stereo 
matching is done only at the "edges" (places where the intensity gradient is large) 
detected in the images. The synthetic example of figure 20 illustrates this situation 
(here we include also a perspective representation of the recontructed surfaces, so 
that the difference between the smooth reconstruction and the optimal estimate can 
be fully appreciated). In figure 21 we illustrate a real example of this situation. It 
represents the interpolation of data obtained along the zero-crossing contours of 
the convolution of a stereo pair of aerial photographs (depicting the campus of 
the University of British Columbia) with a "Difference of Gaussians" operator, by 
Grimson's implementation of the Marr-Poggio stereo algorithm [G4,M2]. We will 
come back to this example when we discuss the stereo matching problem in detail 
in the next chapter. 

We have also used a modified Simulated annealing scheme to get the MAP 
estimator for the same examples presented above (see Marroquin, 1984). The final 
configurations are very similar to the optimal ones, so we do not reproduce them 
here. With respect to the computational efficiency, it took, on the average, around 
450 global iterations (in a global iteration the state of the complete line field is 
updated, and the equilibrium of the depth field is restored) for the Simulated 
Annealing algorithm to converge, while for the OpmAmpm) estimator, only 250 
were needed. Also, in the latter, the behavior of the algorithm was more consistent 
in the sense that the difference in the results from successive runs with the same 
data were smaller than in the former case. 

6. A Fast Algorithm. 

The ergodicity of the "Gibbs chain" (the Markov chain generated by the 
Gibbs Sampler or the Metropolis algorithm at a fixed temperature) means that its 
time behavior mirrors the ensemble probabilistic structure. Since the probability of 
turning "on" a given line element depends on the difference in the values of the 
associated depth elements (i.e., on the current value of the gradient of the field / at 
that location), the configurations with active lines at points of high gradient will be 
generated first. These lines, in turn, will decouple the adjacent depth sites, increasing 
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the gradient even more, generating thus a positive feedback that stabilizes these 
configurations (the opposite happens in regions of low gradient, which prevents the 
formation of stable clusters of lines at those points). 

We can see, therefore, that the behavior of the Gibbs chain can be thought 
of, qualitatively, as performing in time a scale separation of the discontinuities 
of the image. This suggests the use of a deterministic scheme that performs the 
same separation, but compressing the time of the Gibbs chain. A simple way of 
implementing this idea, is to introduce a time varying coupling between the depth 
and line fields, and to allow only "downhill" moves (i.e., those with negative AU*) 
in the updating rules for the line process. Specifically, we compute the increment 
in energy associated with the update of the line element l i3 - at time t using: 

AU* = AV {j + K(tU - /,)%• - Ui) (15) 

instead of equation (14), and accept the candidate state only if AU* < 0. The 
coupling strength K(t) is computed using: 

K{t) = K + ht 

( where K and h are positive constants) until it reaches a given value K Tt and it is 
held constant at this value thereafter. The state of the depth process is updated, as 
before, using equation (10). K must be chosen in such a way that with / == f* Q and 
/,. = o for all z, no lines will be turned "on" in the first iteration. This means that 
if we use equation (2) (with = 1, and the values of <j> given in the corresponding 
tables) to compute the potentials, we must have: 

K ° < — (16) 

where 

a = Bup(/ t - - /y) 2 

On the other hand, the final value of K{t) (i.e., K T \ must be such that no lines are 
introduced in the smooth regions. Let 

6 = inf(/,-/ y ) 2 
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c = sup(/ t - - fjf 



Sm 



where D is the set of neighboring pairs of sites such that each site belongs to a 
different smooth patch (i.e., pairs that lie across a discontinuity), and Sm is the 
complementary set of pairs of adjacent sites such that both sites belong to the same 
continuous patch. Kt must satisfy: 



0.25 Tjr 0.25 
-r- <K T < 



Note that even if we do not know the precise values of a, 6 and c for a given 
problem, usually we can estimate them accurately enough to determine "safe" values 
for K and K T . The value of h controls the number of iterations needed for the 
algorithm to reach a fixed point; if h is too large and the observations are relatively 
sparse, we might get suboptimal solutions where regions with no observations are 
completely surrounded by lines, and therefore, adopt spurious constant values. We 
have found experimentally that usually 50 iterations, i.e., setting 

50 

are enough to produce results that are indistinguishable from those produced by 
the Monte Carlo approximation. 

This scheme has an additional advantage: the optimal value of the coupling 
between the depth and line fields (the constant p in equation (2)) depends on the 
height of the discontinuities relative to the gradient in the smooth patches. It is, 
therefore, a free parameter of the Monte Carlo algorithm that must be adapted to 
each particular problem. Since in the deterministic scheme it is varied dinamically, 
its adaptation to each problem is automatic, provided that we choose K T and K 
sufficiently large and small, respectively, so that the procedure has practically no 
free parameters. 
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Figure 22. (a) Coloring of the coupled line-depth lattice, (b) and (c) Elements whose state 
is stored in each of the two types of processors of a 4-connccted parallel architecture. 

7. Parallel Implementations. 



Both the general Monte Carlo procedure of section 5 and the deterministic 
algorithm of the last section can be efficiently implemented in a parallel architecture. 
To study this implementation, we first note that the chromatic numbers (see section 
6.2 of chapter 3) of the graphs associated with die line and depth neighborhood 
systems are 4 and 2, respectively, which means that the coupled process has a 
chromatic number of 6. In figure 22 (a) we illustrate one possible "coloring". 

The colors of the line process are represented by the numbers 1,2,3,4, and 
those of the depth process by white and black circles. The updating process can 
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be implemented in a 4-connected architecture such as the "Connection Machine", 
by assigning one processor to each depth site and its four adjacent line elements. 
We will thus have two different populations of processors, whose configurations are 
shown in figures 22 (b) and (c), respectively. 

Each complete iteration consist on 6 major cycles: in the first two, the state of 
the white and black depth variables is respectively updated, and in the next four, 
the new states of the binary line variables stored in (say) the white processors are 
successively computed and transmitted to the corresponding memory locations of the 
neighboring black processors. Note that in this scheme we have some redundancy 
in the use of memory (each binary variable is stored twice), but the state of all 
the elements needed for each updating operation is always available from adjacent 
processors. 

7.1. Connection Machine Execution Time. 

The update of each depth site requires 2 (16-bit) multiplications; 5 additions 
and 10 1-bit comparisons, that is, about 600 cycles of a 1-bit processor. The 
computation of the increment in energy for the line process (equation 14) requires 
1 multiplication; 5 additions and 13 1-bit operations, that is 350 cycles. For the 
deterministic algorithm, we require 256 additional cycles for the multiplication 
by the variable coupling constant, while the exponentiation and random number 
generation needed for the Monte Carlo updating use about 2300 additional cycles 
(we assume that the updating of the coupling constant is done once every complete 
iteration in the host computer, and the new value broadcast to the whole network). 

Considering that the Monte Carlo algorithm requires about 200 iterations to 
converge, while only 50 are needed in the deterministic case, we get the following 
approximate estimates for the total execution time in the "Connection Machine" 
(using the same assumptions as in section 6.3 of chapter 3): 2.4 seconds for the 
Monte Carlo procedure, and 0.18 seconds for the deterministic algorithm. 

7.2. Analog Networks. 
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In chapter 4 we discussed the use of the "neural" networks introduced by 
Hopfield (1984) (see also Hopfield and Tank, 1985) for constructing analog systems 
that approximate the optimal estimators of binary fields. Since for a binary system, 
the TPM and MPM estimates are equivalent (see chapter 3), we can, in principle, 
replace the digital computation of the / field in the hybrid scheme discussed above 
(see figure 15) by a "neural" network that approximates the optimal estimate coupled 
with the analog "/" network (note that the switches must be replaced by analog 
devices that implement a multiplication). The time constant of the "neural" network 
has to be adjusted so that the "/" network remains in equilibrium and the search 
space is effectively restricted to the set F* (see section 4). 

To implement this idea, we must define a new energy function that depends 
continuously on /, and whose behavior is similar to U P for /,■ e {0, 1} (Hopfield, 
1985). One such function is: 

E(f, i) = k E E (/.■ - /y) 2 (i - in) + «k E(/,- - 9i? + 

* jew ies 

+*£ £ '.( £ '*-!)» + <*£«!-« + 

i C a :ieCa k£C a -{i} i 

+ c a E E Wy (23) 

c b :iec b jec b -{i} 
where K, a, ci, c 2 , c 3 are constants. 

Following the construction discussed in section 5 of chapter 4, we can now use 
an analog network that implements the dynamical system: 

duj _ dE 

dt ~~ dli ~ Ui 

U = e(u t ) 

Where the function e, which corresponds to the gain of the non-linear amplifiers 
that are at the nodes of the network, is as defined in equation (15) of chapter 4 (note 
that in this case the network also contains non-linear elements that act as analog 
multipliers). 



149 



We have performed numerical simulations of this method, and the results are 
similar to the optimal ones if the parameters of the system are selected appropriately. 
The system can be made practically data-independent by making the coupling K 
between the two networks (see equation (23)) time-varying, in the manner that 
was described in section 6. We have found that a reasonable set of values for the 
remaining parameters is: ci = .15; c 2 = .05; c 3 = 1.5. 

8. Discussion. 

In this chapter we have studied the problem of reconstructing piecewise 
continuous surfaces from sparse and noisy data. We showed that such surfaces 
can be adequately modeled by two coupled MRF's: A depth field with quadratic 
potentials and a binary "line" field with sites in the dual lattice, and with potentials 
that represent our prior knowledge about the geometry of the curves that bound 
the smooth patches. 

We pointed out that a straightforward extension of the general estimation 
procedures derived in chapter 3 to this problem is computationally unfeasible, due 
to the continuous nature of the depth field. Therefore, we proposed a modified 
computational strategy that is based on the fact that the search space for the optimal 
estimates can be restricted to those configurations in which the depth field minimizes 
the (quadratic) conditional posterior energy for each given line configuration. The 
plausibility of this scheme was demonstrated by experimental results showing the 
reconstruction of both synthetic and "real" surfaces. 

We also derived, based on heuristic arguments, a fast deterministic algorithm 
with excelent experimental performance, and whose parameters can be made 
problem-independent, and discussed the implementation of all these procedures in 
parallel digital machines, and in hybrid and analog networks. 

It is interesting to compare the techniques we have presented with other surface 
reconstruction methods that handle discontinuities. The most successful of these 
(see Terzopoulos, 1984) are based on the idea of interpolating a smooth surface 
first and then, detecting the discontinuities by a threshold mechanism. We believe 
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that the method that we are proposing has some advantages over this scheme which 

justify its use in spite of the increased computational cost: 

(i) From a conceptual viewpoint, it is better to perform the interpolation and 
boundary detection tasks at the same time, rather than approximating 
an everywhere smooth surface first, since this operation hides the 
discontinuities that one then tries to find in the second phase. 

(ii) In our method, the values of the parameters depend only on the average 
height of the jumps that one wants to consider as boundaries in the 
reconstructed surface, and thus, they are independent of the location of the 
observations. If these are sparsely located, even when the discontinuity is 
relatively large, the threshold method may fail. 

(iii) A priori knowledge about the shape, orientation and position of the 
discontinuities can be easily incorporated by choice of the potentials of 
the line process. This fact makes our method particularly promising for 
integrating information from qualitatively different sources into a single 
unified estimation procedure. 

(iv) The same algorithm can be used for surface interpolation, noise elimination 
(smoothing) and boundary detection. 

We will now study a related problem: the reconstruction of surfaces from 
stereoscopic images. 
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Appendix 5. A 



HIGHER ORDER APPROXIMATION TO AU* 



In this appendix we describe a higher order approximation to the energy 
increment AU* (see section 4 of chapter 5). We will compute AU* using: 



+ £ 

m=t,y 



AU t « AVij + 



Cfm - fm)[ ]£ (l ~ l km) + Ctq m ] - 2(/ tn - / m )[ J] A(l ~ km) + a 9m g m ] 

(i) 



using the assumption: 



fp^fp for p=^*,j 



the new equilibrium configuration / can be estimated by the following formulas, 
which correspond to the fixed point of: 



(k) 



J 3 



(ifc+l) _ D«€ty *ijfi + otqjgj 



EieN,- Uj + otqj 



(2) 



when / P , p 7^ i, ;' is held fixed: 
Let: 



hm = 



r A 

1 — km> for k,m — i,j 

k l - km, otherwise 



The new equilibrium configuration will be a fixed point of (10), and therefore, it 
will satisfy: 

1 



J m 



hmfrn + a <lm9r 
kEN m 



for 



m = i, j 
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If I- = o and 7 t -7y 7^ 1, we get 



U 



nij 



T {7y 



J2 fkhk + aqigi 



IkENi-ti} 






/,= 



7j 



kENj 



if lij = and 7,-7y = 1, it means that there are no observations, neither at i nor at 
;\ and that these two sites are isolated from the rest of the lattice by line elements. 

Therefore, we use: 

/* + fj 



h = U 



Finally, if ? tJ - = 1, we put 



f = 7m 
Jm l L 

\Jm* 



A 



if 7m 7^ 

if7m = 
for m = i,j. 

Besides, if die move from Z to 2 is accepted by Metropolis criterion, we replace 

A 

fm = f m , for m = i, j 

As described in chapter 5, after all I sites have been updated, M restoring 
iterations using equation (10) of that chapter should be applied. 
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Chapter 6 
SIGNAL MATCHING 



J. Introduction. 

In all the estimation problems we have studied so far, the posterior energy 
function had the form: 

Up(f;g) = U (f) + 52Wfi,9i) (1) 

where U (f) corresponded to the MRF model for the field /. The functions $,-, 
whose precise form depended on the particular noise model, were non-decreasing 
functions of the distance between /,- and g { (see equation (2) of chapter 3): 

*i(f,9i) = -lnPnif*- 1 ^^/)) 

There are some cases, however, when the conditional probability distribution 
of the observations P g \/{g) f) is multimodal (as a function of /) which causes the 
functions $, to be non-monotonic, so that the solution to the problem remains 
ambiguous, even if the observations are dense, and the signal to noise ratio arbitrarily 
high. To illustrate this situation, we will study an important instance of it: the 
"signal matching" problem, whose one-dimensional version is as follows: 

Consider two one-dimensional, real valued sequences h Li h Rt where h L is 
obtained from h R by shifting some subintervals according to the "disparity sequence" 
d: 

h L (i) = h R (i + di) 

with 

di€Q = {-m, -m + 1, . . ., -1, 0, 1, . . ., m} 
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The signal matching problem is to find d given h L} hu. (In a more realistic 
situation, we do not observe h L ,h u directly, but rather some noise-corrupted 
versions gi,gn). Some interesting instances of this problem are the matching of 
stereoscopic images along epipolar lines (Marr and Poggio, 1976); tine computation 
of the dip angle of geological structures from electrical resistivity measurements 
taken along a bore hole, and the matching of DNA sequences. 

To make the discussion more specific, we will consider a simple example, in 
which the sequences h L , h H are binary Bernoulli sequences; we will assume that the 
noise corruption process can be modeled as a binary symmetric channel with known 
error rate, and that d is known to be a piecewise constant function. A well known 
instance of this problem is the matching of a row of a random dot stereogram with 
density p (Julesz (I960)), when the components of the stereo pair are corrupted by 
noise. 

The stochastic model for the observations is then constructed by assuming that 
the right image is a sample function of a Bernoulli process A with parameter p : 

The left image is assumed to be formed from the right one by shifting it by a 
variable amount given by the disparity function <f, except at some points where an 
error is commited with probability e. Note that some regions that appear in the right 
image will be occluded in the left one (see figure 23). The "occlusion indicator" <f> d 
can be computed deterministically from d in the following way: 

f 1, if d t _jt > di + k, for some integer k e (0, m] 
Mv = \„ . . (2) 

10, otherwise 

The occluded areas are assumed to be "filled in" by an independent Bernoulli 
process B. The final model is then: 

I9n(i + d{), with prob. 1 - e, if <j> d (i) = 

1 - 9r{% + d^ with prob. e, if <f> d (i) = (3) 

B p (i) t with prob. 1, if <f> d (i) = 1 
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^ Lines of Constant 
Disparity 



Figure 23. Occluded Regions: The horizontal and vertical axis represent points in one row 
of the left and right images, respectively. Matching points arc represented by black circles. Any 
match in the shaded region will occlude the point i 



Note that in the two-dimensional case, the index i denotes a site of a lattice, and 
therefore it can be represented as a two-vector (ii,i 2 ) whose components denote 
the column and row of the site, respectively. To simplify the notation, we will adopt 
the following convention throughout this chapter: when a scalar is added to this 
vector index (as in g R (i + d { ) and d i+k \ it will be implicitly assumed that it is 
multiplied by the vector (1,0) (so that the above expressions should be understood 
as git{i + {di, 0)) and d i+ ^ k)0 ^ respectively). Using this convention, the observation 
model of equation (3) can be applied either to the one or to the two-dimensional 
cases. 

Notice that even if the observations are noise-free (e = 0) the solution of the 
problem remains ambiguous, and it cannot be uniquely determined unless some 
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prior knowledge about d (for example, in the form of a MRF model) is introduced. 
The use of a MRF model in the stereo matching case, corresponds to a quantification 
of the assumption of the existence of "dense solutions" (this term was introduced 
by Julesz (1960), and essentially corresponds to the assumption that the disparity d 
varies smoothly in most parts of the image; see also Marr and Poggio (1979)), and 
the use of the occlusion indicator corresponds to the "ordering constraint" (i.e., the 
requirement that if % > j, then i + d { > j + d jy see Baker (1981); we put (j> d == 1 
whenever this constraint is violated). 

2. Bayesian Formulation. 



To formulate the estimation problem, we will consider the sequence gi as 
"observations", while g R will play the role of a set of parameters. Thus, from (3), 
we have (assuming, for simplicity that p — \)\ 

P(9L{i) = k\d,g R ) = P gld (k) = 

1 - e, if <j> d {i) = and g R (i + d,-) = k 
e, if <f> d {i) = and g R (i + <f t ) ^ k 

i, if^W = l 

The posterior distribution P d \ g will then be: 

Pd\ g \d) = — 5 = 



= — exp 

r 9 



. io ^ 



I[{[(1 -€)S(g L (z)-g R (i^d i )) + 



*M 



+e(l - 6(g L (i) - g R {i + d { ))](l - <j> d {i)) + ^} 



*( 



where 

if x = 

otherwise 

As a prior model for the disparity field, we may use a first order MRF with 

generalized Ising potentials, such as the one presented in chapter 4. Other models 



lo, 
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may also be used, including the coupled depth and line fields that we discussed in 
the previous chapters. For the present, let us assume that the simpler Ising model is 
adequate. Note that even when the matching problem is one-dimensional (we are 
asuming that there is no vertical disparity between the images, so that the matching 
can be done on a row-by-row basis), the two-dimensional nature of the prior MRF 
model for the disparity introduces a coupling between matches at adjacent rows. 
The posterior energy is: 

U P [d; 0) = ^- £ V ( di > d i) ~ £ MIU " *)%/,« - gii{i + *)) + 
«.j t 

+6(1 - S(g L (i) ~ 9ft(i + *))](1 - «*)) + ^} 
Using the fact that for any a,6^0 

\n[a6{x) + 6(1 - 6{x)\ == 6{x) In a + (1 - 6(x)) In b 

we can write an equivalent expression for U P (modulo an additive constant): 
Up{d; g) = ± E V{di, dj) + W WO In 2 + 

+ f D 1 - MW(9L{i) - g*(i + di)) (4) 



where 



— ■fe) 



3. Optimal Estimator. 



It is possible to apply the general Monte Carlo algorithms developed in chapter 
3 to approximate the optimal estimate d with respect to a given performance measure 
(such as the mean squared error). Their use in this case, however, is complicated by 
the introduction of the occlusion function <f> d in the posterior energy: the size of the 
support for this function equals the total number of allowed values for the disparity 
(see equation (2)). If this number is large, the computation of the increment in 
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energy, or of the conditional distributions (if the Gibbs Sampler is used) may be 
quite expensive. In many cases, however, the size of the regions of constant disparity 
is relatively large compared with the size of the occluded areas. In these cases, one 
can approximate the posterior energy by: 

u i>( d ) = Y, £ Vfa d 3 ) + ^ £ 6(g L (i) - gu (i + di)) (5) 

and increase significantly the computational efficiency (we have successfully used 
this approach to reconstruct the disparity of random dot stereograms). 

In the one-dimensional case, it is also possible to extend the dynamic 
programming methods described in appendix 4.B to compute the MAP estimate; 
this extension is described in appendix 6.A. 

An alternative approach to the solution of this problem is to implement the 
local constraints, generated by the prior MRF model, directly in a deterministic 
"cooperative network" of a given form (a "Winner-Takes-All" network) whose 
fixed point will correspond to the optimal solution. This will be done in section 
6. First we present, in section 4 the definition of a "Cooperative Algorithm", and 
describe and analyze, in section 5, the previous work that has been done in this 
connection. 



4. Cooperative Algorithms. 

Consider the two-dimensional signal matching problem defined in section 2, 
and suppose that to each site % of the lattice n we associate a set of binary variables: 
{fi,d,d€ Q} (we will call this set the "i th column" of the network /, and the set: 
{f%,d, i € fi}, the "disparity layer d" of the same network). 

If a particular variable f ijd = 1, it means that we assign to site i the disparity 
d (note that more than one disparity may be assigned to a node at a given time). 

A "Cooperative Algorithm" (Marr and Poggio, 1976; it is also known as a 
"Cellular automata"; see Wolfram, 1983) is a rule for updating the state of the 
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network /. It can be represented formally as: 

/U* + l) = ^,c/(/W,«) 

with the additional requirement that the interactions should be local, that is: 
FiAf(*)> *) = FUUiAt)'* € TV,-, s e Q}, t) 

where TV,- is the (two-dimensional) neighborhood of site i £ H. The idea is to define 

the functions F (i.e., the connections of the cooperative network) in such a way that 

the following local constraints are implemented: 

(i) Compatibility with the observations: Each element f i>r should receive 
an "excitatory" external input proportional to the conditional probability 
PrfctoM = 9tt(i + r) | d { = r). 

(ii) Smoothness: This corresponds to an implementation of the MRF prior 
model for the disparity: the likelihood that an element f iyd is turned "on" 
(i.e., is set equal to 1) should increase if the elements {f Jtd ,j £ TV,-} are 
"on" (TV,- is the neighborhood of i in the Markov model), so that excitatory 
connections should exist between these elements. 

(iii) Uniqueness: Since in the final configuration /* one and only one element 
of each column {f* fd , d e Q} should be equal to 1, each element should 
have "inhibitory" connections with the other elements of the same column. 

The operation of the network will be Synchronous if all its elements are updated 
in parallel at the same time, and Asynchronous if they are updated sequentially, 
one at a time. Note that one synchronous iteration is equivalent to |/| (the number 
of elements of the network /) asynchronous ones (we will refer to |/| succesive 
iterations as a Global Iteration), and that the evolution of the asynchronous network 
will depend, in general, on the order in which its elements are updated. 

5. "Linear Threshold" Networks. 

The first successful application of this approach (although not formulated in 
probabilistic terms) is the algorithm developed by Marr and Poggio (1976) for the 
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stereo disparity computation. They proposed a binary network of the form: 

fi(t + 1) = a{ Pi ) 



with pi = 



£/j(*Hy + w-* 



3 



hjenxQ; (6) 



. . (1, if p > 
10, otherwise 



w {j satisfying wi 3 - = wj if for all i,j £ n X Q 

and /,- e {0, 1}, for all i 

The parameters w {j , ra and must be chosen in such a way that the constraints 
to the solution of our problem are implemented locally. In particular, the smoothness 
constraint is implemented by defining: 



w. 



s,d,v,d = i, for y G AT X ; x,y eo 



where N x is an excitatory neighbourhood of x. The uniqueness constraint, by: 

™x,d t y,d' = ~ e > for {y t <*') e M Xtd 

with M X)d an inhibitory neighbourhood corresponding to multiple matches at x (see 
Marr and Poggio (1976) for a precise definition of these neighbourhoods), and 

w x,d, y ,d' = elsewhere. 
The compatibility with the observations is enforced by putting 

f o J 1 ' if 9r{z + $ = 9l(x) 
Vx,d = f X) d = < (7) 

10, otherwise 

Although it has not been possible to this date to find a rigorous proof for the 
convergence of this algorithm, numerical experiments and a probabilistic analysis 
(Marr et. al., 1978) show that the synchronous network defined above will converge to 
reasonably good solutions for random dot stereograms portraying piecewise constant 
surfaces. However, this scheme has several problems (although some modifications 
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to get around them are suggested in Marr and Poggio, 1976 and in Marr et. al., 
1978): 

In the first place, the quality of the results degrades very fast as the density of 
the tokens in the stereogram decreases. Besides, it is not clear how to extend this 
formulation to the more interesting cases of slowly varying disparities, and different 
types of tokens placed in points that do not correspond to a regular lattice. 

5.1. Asynchronous Algorithms. 

We now consider algorithms of the form (6) that operate asynchronously. In 
this case, it has been shown (Hopfield, 1982) that if we choose the parameters in 
such a way that p t - is never (this can be done, for example, if iu tJ - and rji are 
integers, by giving a non-integer value), the "Energy" function: 

£(/) = - \ E "ijfifj - E Mm ~ 0) (8) 

will decrease monotonically at every global iteration of the asynchronous algorithm 
in which the state of every element is updated, unless the network is at a fixed 
point. 

It is interesting to note that with the parameter definitions given above for the 
stereo problem, the term 

~~nJx,d 2s fy,d 
* ven x 

in (8) will be negative only if all the spatial neighbors of the cell x on the same 
disparity layer are "on", and therefore corresponds to a smoothness constraintThe 
term 



fx,dfx,d 

corresponds to the compatibility with the observations, and the remaining terms: 
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fx,d 



+ -, E V 

Z y,d'eM a ,d 



may be considered as an implementation of the uniqueness constraint, since their 
minimization requires that we have as few "on" cells as possible, and it penalizes 
explicitly the local non-uniqueness of the solution. 

5.2. Experimental Performance. 

To study the performance of these algorithms, we implemented a simulator 
of both the synchronous and asynchronous networks. The "stimulus" used for the 
set of experiments performed, was a random dot stereogram portraying a square of 
21 x 21 elements floating at disparity -2 in front of a flat background at disparity 0. 
Figure 24 shows this stereogram and the fixed points obtained by the synchronous 
and asynchronous algorithms. 

In both cases, the behaviour of the algorithm shows two distinct phases: In the 
first iteration, most of the elements that are "on" on the wrong layers (and some on 
the correct ones) are turned "off" (see figure 24-b). As a result of this, at succeding 
iterations, the probability of having a cluster capable of growing is relatively high 
for the correct regions, which begin to fill in, and very small for the wrong ones, 
for which the remaining "on" cells are turned "off". 

This form of operation causes that the precise shape of the boundaries between 
regions will depend on the exact shape and location of the random clusters that are 
formed after the first iteration on the correct layers. Also, it is easy to see that the 
form of the inhibitory neighbourhood (see Marr and Poggio (1976)) causes the cells 
lying on wrong layers along a narrow band near the edges of the background to be on 
the average less inhibited by the "on" elements in the correct layers (which in turn 
are less stimulated) than the interior points, making thus more likely the formation 
of wrong stable clusters in these regions. This effect is more pronounced in the 
asynchronous case, since a wrong cell that is left "on", can increase the excitation 
of a neighbouring one on the same global iteration, increasing the likelihood of a 
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(a) 






(b) 





(c) 




(d) 



Figure 24. (a) Random dot stereogram portraying a 21 x 21 square at disparity -2. (b) 
State of the network after one iteration of the synchronous algorithm, (c) Fixed point for the 
Synchronous Algorithm, (d) Fixed point for the Asynchronous Algorithm. 
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stable cluster, whereas on the synchronous case, all the cells of the cluster must be 
left "on" at the same time. 

For the values used for the parameters (e = 2, = 3.5 ; see Marr and Poggio, 
1976) the energy defined in (8) decreases monotonically at each global iteration 
of the asynchronous network, and thus, it converges to a configuration that is a 
local minimizer of this function. The correct solution will also correspond to a 
(different) local minimum; it is interesting to note, however, that in general it will 
not be the global one. It is easy to show, for example, that if the random dot 
stereogram portrays a region that has a ratio of area/perimeter less than a critical 
value (for the current value of the parameters this critical ratio is « 13), this region 
will not be distinguished from the background in the configuration that globally 
minimizes the energy. This means that the use of simulated annealing to minimize 
(8) will not necessarily improve the solution; however, we have found that after the 
deterministic algorithm has converged, a few iterations of Metropolis algorithm at 
a moderate temperature (» 1) may be very effective for removing the clusters at 
wrong layers. This is illustrated in figure 25. 

• i '. • 
6. Winner-Takes-All (WTA) Networks. 

Linear threshold networks are not the only form of local implementation of the 
constraints generated by the probabilistic formulation of our problem. A different 
possibility is to associate with each column {f X)d) deQ}a binary "Winner-take-all" 
synchronous network: 

The input u(x, d) to each cell corresponds to the excitatory input in the linear 
threshold case, that is, to the local implementation of the smoothness constraints 
and the compatibility with the observations. 

The inhibitory terms (the uniqueness constraint) are implemented in the form 
of a WTA mechanism. The output (the new value of f Xfd ) is given by: 

(1, if u(x, d) = max<,'e<2 u{x, tf) 
fx,d = \ \ J ) 

10, otherwise 
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(a) 





(b) 




(c) 



Figure 25. (a) Fixed point at T = 0. (b) State after 4 iterations at T = 1. (c) Fixed point at 
T = with (b) as initial state. 



This means that f Xtd will be "on" at time t + 1 only if it is maximally stimulated 
with respect to all the other elements in the same column at time t, and if it is 
"compatible enough" with the observations (see figure 26). 
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External Input : P 



Excitatory 

Connections 

(MRF Model) T 




Column 

(WTA Mechanism) 



Figure 26. Winncr-Takcs-All network (sec text). 



This design has several advantages : 

1. For dense stereograms, we will show that it converges to the correct solution 
in a small number of iterations. 

2. For sparse stereograms, the algorithm will give, with high probability, the 
correct disparity at every location in which a matching token is present 

3. It exhibits a good performance with natural images portraying piecewise 
constant surfaces. 

4. It is not necessary to process the whole domain n at the same time; a 
complete representation may be built up by defining local networks corresponding 
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to overlapping subregions that cover n. This feature enables the algorithm to process 
arbitrarily large images. 

5. Jt can be extended in such a way that it can handle more complex situations, 
such as transparent and piecewise smooth surfaces. 

Qualitatively, this improved performance can be explained as follows: 

Unlike the linear threshold design, in the first iteration the WTA algorithm will 
only turn "off' cells that do not lie in the correct disparity layers. This will cause 
the cells that lie at the boundaries of clusters at the wrong layers to lose, in the 
subsequent iterations, against the corresponding strongly stimulated cells that lie in 
the interior of the "correct" regions. This will result in a progressive shrinking of 
the wrong clusters, and will end up with their disappearance. 

This results in a faster convergence, since the size of the clusters that have to 
be killed is in general smaller than the size of the regions that the linear threshold 
algorithm has to fill in. Also, the boundaries between constant disparity regions will 
be more accurately localized. 

The only situation in which this behavior will not take place, is when there is 
a significant overlap (due to accidental correlations in the images) between regions 
lying at different depths. In this case, the algorithm will not be able to solve the 
ambiguity correctly based only on smoothness considerations, and it will locate the 
boundary at a position, within the region of overlap, which will depend on the 
detailed shape of this region. Also, the solution will not be so clean in this case; a 
few cells, corresponding to different disparities at the same spatial position, may be 
left "on" in the final state (limit cycles involving some of these few cells are also 
possible). 

This type of ambiguity (accidental overlap) is relatively frequent in sparse 
stereograms. However, the regions of overlap are typically "blank" regions (i.e., 
without tokens), and the algorithm will give the correct disparity at all token 
locations. 

We will now make these considerations more precise. First, we will need some 
definitions. 
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1. n will be defined as the set of points lying on a finite square lattice. 

2. We will use a second order MRF with Ising potentials as the prior model for the 
disparity field. Therefore, for each x £ n, we define its neighborhood as: 

M x = Qf){y : < |x — y| < 2} (10) 

3. Given a region R C n, we define the set of its interior points (with respect to 
N x ) I{R) as the set of points in R such that all its neighbors also belong to R : 

I(R) = {xeR:\N x f)R\ = \N x \} 

In a similar way we define: 

I\R) = I(I(R)) 

and so on. We call the points in R that are not interior: x 6 R - I(R), Boundary 
points ofR. We will say that a region R is connected if, given any two sites i f j e R, 
we can find a sequence of sites {t = t , h, . . ., i p = j}, with i k eRfork = l,..., Pt 
such that i k £ N ik+l for A; = 0, . . ., p - 1. 

4. Given a region R C n, we define its Diameter D(R) (with respect to iV x ) as the 
smallest integer such that: 

Alternatively, if we define an algorithm that deletes all the boundary points of a 
region at every step, the diameter of the region is the minimum number of steps 
necessary to completely delete the region. 



5. The initial state of the network will be given by: 

if 9r(x + d) = g L (x) 



Jx,d — ) 

lo, 



otherwise 
6. The WTA algorithm for this problem will have the particular form: 

r (>a.i\ J 1, if u ^A t ) =zmax d'eQu x>d it) 
10, otherwise 



en) 



«x,dW = «/"d+ E f.Jf) (12) 

veN, 
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7. Wc will assume that the set Q can be covered by M + 1 non-overlapping regions: 

n = *iU-..U*AfUo 

and that the correct solution (i.e., the way the stereogram was generated) consists in 
assigning to every point in Ri the depth d t -: 

fxA = 1 iff * e Ri 

The set O corresponds to the union of all the regions that are occluded in the left 
image (see figure 23), and therefore, for every x € O, any depth assignment will be 
considered "correct". 

8. Since we are assuming that the observations are perfect, the loading rules 
guarantee that 

f xA = 1 for every xeRi 
However, in many cases we will also have: 

fl,dj = 1 for some x £ R { and dj ^ d { 
due to accidental correlations in the images. A connected set Wj defined as: 

Wj = {x : f° xd . = l and x G Ri for some dj ^ <•} 
will be called a wrong cluster on layer j ofR it 

9. We will say that a stereogram has well defined boundaries if there are no large 
wrong clusters overlapping the boundaries between adjacent regions. This means 
that every non-occluded point must have at least as many "on" neighbors at time 
on the correct layer as in any other layer, i.e., for every region R k and every point 
*eRk, 

£ fy,d k > E fid for all rf ^ d k (13) 

V6N X yeN, 

10. A stereogram will be said to be unambiguous if for every region R { and every 
wrong cluster Wj there is at least one point x e Wjf]Ri which has less "on" 
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neighbors at time on the wrong layer dj than in the correct one d,-, i.e., 

E fU < E flu (h) 

yeN x y eN x 

We can now establish the following result: 



Convergence Theorem: Given an unambiguous random dot stereogram with perfect 
observations (0 error rate) portraying M non-overlapping regions of constant depth 
with well defined boundaries, the WTA algorithm (12) with a > 8 will converge to 
the correct solution in K iterations, where K is the diameter of the largest wrong 
cluster in ft. 



Proof: 



1) First, we note that condition (13) guarantees that all the cells on the correct layers 
(which, by (11), are "on" at time 0) will remain "on" at time 1. 

2) Condition (14) and the definition (12) guarantee that for every wrong cluster Wj 
on every region R { there will be at least one point x that will be turned "off in the 
first iteration. Then, for all points y e N x f| W 3 - f| Ik we will have: 

X, fz,di < J2 fz,di 

Z£Ny Z£Ny 

which implies that f® d . = 0. 

A recursive application of this reasoning establishes the theorem, i 

Remarks: 

1. For occluded regions, there will be no large clusters of "on" cells in any layer of 
f°, and since the form of (12) precludes the growth over regions with f° = 0, if 
there are any isolated points for which f xd = l, they will remain "on" in /* (the 
fixed point of (12)); otherwise, /* = uniformly over these regions. 
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2. If the algorithm has ambiguous boundaries, we can still use this theorem to 
guarantee the convergence of the WTA algorithm to the correct solution outside the 
overlap regions. It is clear that if we define new non-overlapping regions R\, . . .R' M 
with non ambiguous boundaries, and include the overlap areas in the set O, the 
theorem will guarantee that we get the correct solution in the new regions. In the 
overlap areas, the stable state of the network may include some leftover ambiguity 
(fx,d = 1 for more than one d), and even limit cycles involving a few cells. However, 
these problematic areas will be confined to layers of unit width along the portions 
of the (final) boundaries that lie inside the overlap regions. 

3. The probability of finding wrong clusters in a binary stereogram is related to the 
probability of finding a repeated subsequence on a Bernoulli sequence of length 
equal to the total number of disparity layers, and decreases exponentially with the 
number of cells belonging to each of these clusters. For dense sterograms (generated 
by a Bernoulli process with parameter p = ±), the probability of finding a wrong 
cluster that contains a square of m cells per side can be bounded by 

Pr(cluster) < £g 

where N D is the number of disparity layers, and \Q\ is the total number of cells in 
the lattice. On the other hand, a cluster of diameter A: must contain at least a square 
of side 2k + 1. Thus, if N D = 7 and |n| = 64 2 , for example, we can guarantee that, 
for dense stereograms, the algorithm will converge to the correct solution in less 
than 3 iterations with probability > 0.99. 

4. For sparse stereograms, wrong clusters involving only "blank" areas will be very 
common, but those containing active tokens will be rare. This fact, together with 
remark 2, mean that, with high probability, tha WTA algorithm will find the correct 
disparity at all the sites that have active tokens. This has been confirmed by our 
experiments. 

5. Algorithm (12) will not grow regions into occluded (uncorrelated) areas. 
Psychophysical experiments show that these areas should be included with the 
adjacent region that is at the greatest depth. It can be verified that an algorithm 
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such as the following: 

/*,,/(* + 1) = j 1 ' if ^ yeN * fy > d{t) > V'-A^veN* ! v M\ d'¥>d 
lo, otherwise 

with / X|d (o) = f* £)d (the fixed point of (12)), will converge to a solution in which 
these regions are correctly filled in, provided there are no wrong clusters in the 
occluded regions, and that each layer of constant d is allowed to converge separately, 
Starting with d = dmin = m'm(d £ Q). 

6. Note that even when (x,, x 2 ) G fi, (z, + d, x 2 ) may lie outside n and so, if we load 
the network using (11), some cells near the boundaries of n may remain undefined, 
and (12) may give incorrect results. Therefore, we implicitly assume the existence 
of a larger region fi D fi such that for all x e fi, f° yd is defined for y e N x U{x} 
and d 6 Q. Also, the operation of (12) should be understood in a modified sense, 
so that f Xtd (t) = f° Xtd for all x e fio - fi, all deQ, and all t. 



A useful corollary establishes that it is not necessary to process all n at the same 
time, but that a complete representation can be built up by defining local ner . orks 
corresponding to windows SCfi, provided that there is enough overlap between 
them. In particular, we will show that if we load the local network S in such a way 
that its initial state coincides with the initial state of the complete network at those 
cells, and if the algorithm operates only on the interior points of 5, keeping the 
state of the boundary points fixed, then the final state of the local network at these 
interior points will correspond to the optimal solution: 

Let /{,(x, d) and / 4 5 (x, d) be the state of the (x, d) cell at time t in the complete 
and local network respectively. We have: 

Corollary 1: Suppose the conditions of the convergence theorem hold in n, and 
consider a set 5 C fi such that the stereogram is not completely ambiguous in 
Si = I[S) (i.e., condition (14) holds for every x e S { ). Suppose that we load the 
local network f s in such a way that for every x e S,f Q s {x,d) = f^{x,d) t for all 
deQ. 
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Then, algorithm (12), modified in such a way that f's{x,d) = f%{x,d) for all 
t> all x £ S - Si, and all d £ Q, will converge to a fixed point f* s for which 
f* s (x,d) = fn(x,d) for all x belonging to unoccluded regions inside S\. 



Proof: 



Consider a region R of constant disparity d such that R' = Rf]Si ^ 0, and let 
B[ be the intersection of R with the boundary of Si. For every point x £ W - B u 
fs(x,d) = 1, by the same arguments as in the convergence theorem. For x £ B lt 
fs{*><*) = 1 too, since f° s (y,d) = /8(y,d) for y £ N x , and (13) holds in n. 
Therefore, for every x £ R\ f l s (x, d) = 1 

On the other hand, for any wrong cluster W d > C R' in layer d } ^ d, since the 
stereogram is not completely ambiguous inside Si, there will be at least one point 
x £ W# such that f\{x, d') = 0. Reasoning as we did before, we have that for all 
points y€N x nw d >r\ R y we will have: 

1^ fz t dj < z2 fz,di 

ZENy Z£Ny 

which implies that /SfL = 0. 

Applying this reasoning recursively, we get, for every x £ R\ that f* s {x, d) = 1, and 
f* s (x, d') = 0, d' =^ d, which, together with the convergence theorem, completes the 
proof.! 

Note that S - Si defines the overlap that should exist among local windows, so that 
the complete representation, defined by 



fi = US l i 



U) 



is correctly formed. 
6.1. Numerical Results. 
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To test the performance of algorithm (12) with random dot stereograms, a 
simulator was implemented in a Symbolics 3600 computer. Figure 27 shows the 
fixed points corresponding to dense and sparse stereograms portraying a pyramid. 
As predicted by the theory, the convergence to the correct solution is fast (less than 
4 iterations) in both cases. In the case of the sparse stereogram, the boundaries are 
slightly misplaced, but, as can be verified by direct inspection of the stereogram, 
all the dots are correctly located. The fixed point corresponding to the synchronous 
operation of (6) (obtained after 11 iterations) is also presented, for comparison. As 
we can see, the WTA algorithm (12) converges much faster to a much more precise 
result. 

7. Recontruction of Real Images. 

To apply this algorithm to the processing of real images, there are some 
modifications and extensions that should be made. 

7.1. Neighborhood size. 

It is possible to increase the robustness of algorithm (12) with respect to the 
presence of noise in the images by increasing the size of the excitatory neighborhood 
(i.e., by postulating a more global MRF prior model) and decreasing the value of 
the parameter a. This increased robustness is traded off by a decrease in resolution: 
small correct regions may be trated as "noise", and therefore disappear from the 
solution. Also, the shape of the piecewise constant regions may be altered (corners 
may be rounded and small concavities "filled in"). 

7.2. Token Selection. 

Trie simple rule (11) is adequate for measuring the compatibility with the 
observations in the case of a synthetic image (such as a random dot stereogram). 
However, it will not work in the case of continuous-toned images of real objects. 
The reasons for this failure are manifold: the distribution of the reflected light 
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Figure 27. (a) Dense Stereogram (density = 0.4) portraying a pyramid, (b) Fixed point for 
algorithm (12) (c) Sparse stereogram (density = 0.1) portraying a pyramid, (d) Fixed point for 
algorithm (12). (c) Fixed point for the Synchronous algorithm (6). 
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varies as the viewpoint is changed (particularly the specular component), and the 
two retinas (cameras) may have different point spread functions, and be affected 
by independent sources of noise. This means that the model for the observation 
process given by equation (3) should be replaced by another thai reflects the process 
of formation of natural images in a more realistic way. The use of a better model 
will cause the term f Q xd in equation (12) to be replaced by a different compatibility 
measure r) X)d which is obtained by first preprocessing the right and left images using 
an operator T whose output should be, ideally, invariant under the changes in 
viewpoint, optics, etc., and then computing a suitable defined distance D between 
the two processed images: 

Vxid = D{Tg R {x + d), Tg L (x)) (15) 

(note that rj may be continuous-valued). 
The new WTA algorithm will be: 

/ /+_i n J 1, if u xA t ) = max d>€Q u xA t ) 

10, otherwise 
u x ,d( t ) = <xVx,d + PN(f {t) ,x,d) (16) 

The operator P N is generated by the enlarged MRF model, and in general it will 
represent a weighted average of the values of the field in the enlarged neighborhood: 

P N (f,x,d)= £ <\x-y\)f x4 (17) 

yEN x 

where N x is the extended neighborhood of x and c(-) denotes a set of parameters 
that depend only on the distance \x - y\, and are related to the prior MRF model 
for the disparity. f° may be chosen as: 



"-to, 



-u . -, if Vx,d = max r6 Q r) x>r 

Jx.d — \_ 

otherwise 



The convergence of this modified algorithm to the correct solution can still be 
guaranteed if condition (13) is replaced by the requirement that the cell corresponding 
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to the correct layer of every non-occluded point should be maximally stimulated at 
time 0, with respect to the other cells in the same column, by neighbors belonging 
to the same constant disparity region: 

<*»7z,4 + P#(/°. x, <k) > arj Xtd + P N (f°, x, d) (18) 

for every region R { \ every x e Ri and every d £ Q. P$ is the operator P N restricted 
to R { : 

P { j){f,x,4= £ <\x- y \)f ytd 

(this modification is necessary to cover the case in which a point near the boundary 
of a constant disparity region is partially stimulated by a wrong cluster outside this 
region which may disappear in succeeding iterations). 

Condition (14), i.e., the requirement that every wrong cluster has less "on" 
neighbors at time on the wrong layer than in the correct one, can now be expressed 
by requiring that for every region R { and every wrong cluster Wj on layer j of R it 
there is at least one point x e Rif] W 3 - such that: 

JW^My)<JW .*,*) (19) 

Under these conditions, it is easy to use the same arguments of the proof of 
the convergence theorem to verify the convergence of algorithm (16). It should be 
remarked that conditions (18) and (19) are sufficient, but by no means necessary; 
(16) may converge to the correct solution even if they are violated by a particular 
stereogram. 

The determination of the optimal operators D and T in equation (15) is a 
difficult — and as yet unsolved problem. One scheme that has often been used is 
to define T as a convolution operator whose kernel is the Laplacian of a Gaussian 
function , and T as: 

T( M J 1 ' ifa6> ° 

10, 



otherwise 
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(see Marr and Poggio, 1979). The rationale for this choice is that the zero crossings 
of the convolution with the Laplacian operator should pick the places where large 
intensity changes occur in both images (i.e., it acts as an "edge detector"), while the 
Gaussian kernel has the elTect of smoothing out the "irrelevant" edges and filtering 
out the noise. One difficulty, however, is that if the Gaussian mask is large enough 
to produce the desired effect, it will also introduce errors in the localization of the 
zero crossings of the convolved images, which will translate into errors in the depth 
of the reconstructed surface (see Clark and Lawrence, 1985). 

We have found that the normalized absolute value of the Laplacian of the 
difference between left and right images: 

— v(x, d) + max r gQ v(x, r) 

77 j ' — - ■■ --- ■ ■ i 

max r £Q v(x, r) — min r6 Q v(x, r) 

with 

v{x t d) = \V 2 {g R (x + d) - g L {x))\ (20) 

has relatively good experimental behavior, but clearly, much more research is 
needed in this area. 

It is important to note that the definition of rj will affect the performance of the 
WTA algorithm, since it will determine the extent to which conditions (18) and (19) 
hold in the initial state; the structure of the WTA network, however, is independent 
of the choice of 77, so that the experimentation with different definitions can be 
done very efficiently. 

7.3. Uniqueness Constraint. 

The definitions (12) and (16) imply the enforcement of the constraint: 

"Each point in the left image should be matched by only one point in the right 
image". 

That is to say, we are enforcing the uniqueness constraint along the left eye 
line of sight, it is also possible to include explicitly the corresponding constraint for 
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the right eye (as in Marr and Poggio, 1976). This is done by replacing (16) (or (12)) 
with: 



/«,«/(* + 1) 



'If if u Xt d(t) = maxd'zQ u Xtd >(t) 

and u x>d = mzxk.d+keQ u x-k,d+k 
.0, otherwise 



For perfect observations, this additional constraint is redundant. If noise or other 
distortions are present, however, this scheme will have better performance, since the 
disparity of "doubtful" points will be left unassigned (the corresponding values of 
the disparity in these locations may be determined after convergence by the robust 
surface reconstruction techniques described in chapter 5). 

As an example of the application of this technique, the processing of a stereo 
pair of aerial photographs is illustrated in figure 28 (this stereo pair is the same that 
was used in chapter 5; see figure 19). Although it is difficult to assess objectively 
the performance of an algorithm on this type of images, the quality of these results 
seems at least equivalent to that obtained by state-of-the-art systems (see Grimson, 
1984). 

7.4. Piecewisc Smooth Surfaces. 

The WTA scheme can also be applied to reconstruct disparity surfaces that 
are piecewise smooth. To do this, it is only necessary to modify the definition of 
the operator P N (equation (16)), so that cells at nearby depths are aJso taken into 
account. Notice that, in order to be consistent with the WTA mechanism, only the 
maximum contribution for any given column should be considered. The modified 
operator is: 

P N {f,x,d)= J2 max{c(|x-i/|,|<f-r|)/ y , r } (21) 

y eN x re7Vrf 

where c(-, •) is some fixed decreasing function of its arguments, and N d is a disparity 
neighborhood defined as the intersection of a closed interval with the set of allowable 
disparities: 

N d = [d- P} d + p]f)Q 
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(a) 




(b) 



(c) 



Figure 28. (a) Stereo pair of aerial photographs, (b) Final state of the WTA network (disparity 
is coded by grey level; white areas have no assigned disparity), (c) Reconstructed sufacc, obtained 
using the algorithm described in section 6 of chapter 5. 



181 



where p is a positive constant. 

The sufficient conditions for the convergence of the modified algorithm, namely, 
that the stereogram should be unambiguous and have well defined boundaries with 
respect to the the modified operator P N , can also be expressed in the form given 
by equations (18) and (19), but now a wrong cluster Wj should be defined as a 
connected region on the disparity layer dj such that /J d . = 1, and dj 7^ d*(x) for 
all x G Wj, where d*(x) is the true disparity at point x. The proof of the convergence 
theorem is straightforward, but the interpretation of these conditions is not obvious, 
and in practice, they are very difficult to verify, so that at this point, the performance 
of this algorithm should be assessed experimentally. 

Pradzny (1984) (see also Pollard et. al., (1984)) has obtained good results for 
the reconstruction of piecewise smooth and "transparent" surfaces (i.e., stereograms 
portraying sets of small interspersed patches that belong to two smooth surfaces, 
one in front of the other) using an operator of the form: 

p N {f,*,Q= £ EWI*-y|.M-H)/ir.r} 

y6N x rEN d 

We believe that the use of (21) should improve the performance in these cases. 

8. Discussion 

In this chapter we have studied a class of recontruction problems that arise 
when the conditional distribution of the observations is a multimodal function, 
which causes the solution to remain ambiguous, even for arbitrarily high signal to 
noise ratio. We identified the signal matching problem as one of the most important 
instances of this class, and gave a probabilistic formulation for it using a MRF 
model to model the disparity surface, so that the optimal estimation algorithms 
derived in chapter 3 could be applied. 

We then presented a different approach to the solution of the problem in 
which the constraints derived both from the prior MRF model for the disparity 
field and from the observations are implemented directly as excitatory connections 
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on a three-dimensional cooperative network of processors (or "cells") with binary 
state space. The steady state of this network can be unambiguously interpreted as a 
disparity surface only if there is exactly one processor in each column whose state 
is equal to 1. This imposes a uniqueness constraint which can be enforced either 
by introducing inhibitory linear connections, or by a "Winner-take-all" mechanism 
that operates within each column. We showed that, for high signal to noise ratio, it 
is possible to define precise sufficient conditions (which are usually met in the case 
of synthetic images) for the convergence of the state of this "WTA" network to the 
correct solution in a small number of iterations. 

The experimental performance of this algorithm with random dot stereograms 
is excellent; it produces accurate reconstructions in a very short time (in less than 5 
iterations). In the case of the reconstruction of real objects from stereoscopic 
photographs, this algorithm — with some modifications — produces results 
comparable with those obtained by more complicated schemes that are considered 
"state of the art", and it has the advantage of being directly implementable in 
parallel hardware. 

Tt should be noted that the performance of the stereoscopic vision of human 

beings on similar data is still dramatically superior to that of this, or any other 

existing artificial system. Some issues that should be addressed for the development 

of more effective algorithms are the following: 

(i) More realistic models for the observation process that take into account 
the nature of the relative distortions of the left and right images should be 
constructed. This should lead to the definition of optimal combinations of 
tokens for the matching process. The precise nature of the optical system 
used (which may cause problems like non-horizontal epipolar lines; vertical 
disparities, etc.) should also be taken into account. 

(ii) The use of more sophisticated prior models for the disparity field — 
including a coupled line field as described in chapter 5 — should be 
investigated. 

(iii) Since the intensity edges and the regions of uniform intensity (or uniform 
texture) of the images are natural candidates for becoming stereo matching 
tokens, and the location of depth and intensity (or texture) edges is likely 
to be correlated in a natural scene, the integration of edge detection; 
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image segmentation; stereo matching and surface reconstruction into a 
single estimation process should produce very good results. The Bayesian 
approach, and the use of coupled MRF models for describing surfaces 
and edges that we have presented in this thesis should provide a unified 
framework for performing this integration. We discuss this point further 
in the next chapter. 
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Appendix 6.A 
DYNAMIC PROGRAMMING APPROACH TO SIGNAL MATCHING 



Consider the one-dimensional version of the signal matching problem described 
in section 2. To compute the MAP estimate, we need to find the global minimum 
of: 

Uj>(d; g)=~j: V[di, dj) + J £ <f> d {i) In 2 + 

+? EU - AiWM^M - **(»' + *)) (i) 

(i.e., equation (4)) The use of the dynamic programming algorithm described in 
appendix 4.B is complicated by the fact that, given the boundaries L n between 
regions of constant disparity, the optimal estimate for d in the interval (Z^Lf+i] 
depends on the estimate on (L,-_i,L,-], since this last choice determines the extent 
of the occluded region. 

However, if we assume that the size of the regions of constant disparity is 
relatively large compared with the size of the occluded areas (as it normally happens 
in most practical cases), we can estimate d given L n using the formula: 

2((A-,A-+i|) = 

= {*: £ S(9L{i)-9R[i + k))< £ 6{g L {i) - gn{i + /)), for alU £ Q} (2) 
t=L,-+l t=L,+l 



Defining: 



and 



Gk,l= £ S(g L (z)-g R (i^d((k,l}))) (3) 

t'=lfc+i 



A A 

A; = max(0, d t _i — <£,) 



185 



(note that A t - corresponds to the length of the occluded region when a change in 
the estimated disparity occurs), we get that 

Ui>(d;g) = ~^U(L) 

with 

U{L) = G lfLx + Ai In 2 + <3 Ll+Al+ i, La + 

+A 2 In 2 + ... + G Ln+An+ltN 

We can now perform the global minimization ofU using the dynamic programming 
scheme of appendix 4.B. In this case, however, it is better to use "forward" 
recursions, (in the sense that now Fj(k) will represent the cost associated with 
putting j boundaries, in the best possible locations, in the interval [1, k}\ because 
occlusion, as we have defined it, always takes place from left to right. We have 
then: 

F {k) = Gi, fc 

L {k) = 1 

F j+1 {k) = inf ' {C? f - +Ai +j,A + Fj{i) + Ay In 2} 

t<A: 

Lj+i{k) = {L:C? L+Ay+ i, fc + F,{L) + Ay In 2 = F j+i (k)} 
The optimal location of the boundaries, for any given n is: 

S n = {L n (N), L n _!(X n (iV)), . . , Li(L2(. . .(Ln{N)). . .)} 

The optimal configuration is computed using (2), and the corresponding energy, 
using (1). 

Note that as the size of the regions of constant disparity decreases, (2) may not 
be well defined (the optimal estimate d may not be unique) and a more complex 
optimization procedure may be required. 
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Chapter 7 
CONCLUSIONS 



In this thesis we have presented a probabilistic approach to the solution of 
a class of perceptual problems. We showed that these problems can be reduced 
to the recontruction of a function on a finite lattice from a set of degraded 
observations, and derived the Bayesian estimators that provide an optimal solution. 
We have also developed efficient distributed algorithms for the computation of these 
estimates, and discussed their implementation in different kinds of hardware. To 
demonstrate the generality and practical value of this approach, we studied in detail 
several applications: the segmentation of noise-corrupted images; the formation of 
perceptual clusters; the recontruction of piecewise smooth surfaces from sparse data 
and the reconstruction of depth from stereoscopic measurements. 

This methodology also permits, in principle, the incorporation of more than one 
modality of observations into a single estimation process, as well as the simultaneous 
estimation of several related functions from the same data set. This makes one hope 
that this framework could be useful in the solution of difficult problems that require 
such an integrated approach. We mention two examples: 

1. We mentioned in chapter 6 that the stereo matching problem in real situations 
has not been solved yet in a satisfactory way. The same can be said of other related 
perceptual problems such as: edge detection; image segmentation; the recovery 
of the shape of an object from a single two-dimensional image (the "shape form 
shading" problem), and the segmentation of a scene into distinct objects, as well 
as the recovery of their three-dimensional structure from the analysis of images 
formed at successive instants of time (the "structure from motion" problem). All 
these problems are obviously related, and it is intuitively clear that the individual 
solutions that can be obtained should improve if the mutual constraints that the 
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solution of each individual problem imposes on the others were taken into account. 
Thus, the presence of a brightness edge should increase the likelihood of a depth 
edge, and viceversa; the depth estimated from stereo should be compatible with 
the shape derived from shading; points belonging to the same region in an image 
should move together, etc. We believe that these constraints can be incorporated 
in the potential functions of the corresponding MRF models (in particular, of the 
coupled fields that represent the "lines" or edges in each case; see chapter 5). 

2. The processing and interpretation of geophysical information (as is done, for 
example in oil prospecting) attempts to reconstruct subterranean geological structures 
from information provided by a set of qualitatively different measurements, such as 
those obtained by: gravimetric and magnetometric surveying; reflexion seismology; 
measurements of physical properties taken vertically along bore holes ("well logs"), 
etc. Since all these measurements are obtained independently, their joint conditional 
probabilities can be easily determined, and since all of them refer to the same 
physical structures, their processing can, in principle, be integrated into a single 
estimation process, which should greatly increase the reliability of the results. 

The above considerations may be taken one step further. Ultimately, the results 
one is interested in are not only the quantitative reconstruction of some surfaces, but 
the symbolic description of the scene in terms of functional structures or "objects". 
On the other hand, the prior knowledge about the occurance of a particular object 
or class of objects might greatly simplify the tasks of the "low level" processors 
(for example, a letter recognition algorithm should greatly benefit from the use 
of context, given the probabilities of occurance of certain letter combinations or 
words). The Bayesian approach provides a common "language" that may allow 
these low-level and high-level (or symbolic) processes to communicate and mutually 
enhance their performance. 

As a simple example of this situation, suppose that we are interested in finding 
a symbolic description of a binary pattern / in terms of a set of geometric objects 
(such as squares, triangles, etc.) that are characterized by some parameters (such 
as position, orientation, size, etc.) for whose values we have some prior probability 
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knowledge. 

Given a description P, i.e., a list of objects with a set of particular values for 
their parameters, we can find a binary field q which corresponds to the boolean sum 
of the indicator functions of the objects included in D: 

if an object in D covers pixel i 



J 1 ' 
lo, 



otherwise 

We can now write the joint prior distribution for the field / (which represents the 
actual intensity of the noise-free image) and its description as: 

P(f, D) = P(f | D)P(P) 

To compute P(f | P), we assume that / is a first order MRF whose configuration 
is biased by P: 

P(f \»)=\ exp[--L £ V(f it /y) + X £ qJi] 

P(D) can be computed from the prior probabilities for the occurance of each type 
of object, and from the prior distributions for the values of the corresponding 
parameters. Since the conditional distribution of the observations depends directly 
only on /, the posterior distribution will be: 

P(9 I f)P[f, D) 



P(f, D | g) 



P{9) 



where P(g) is a constant From this expression we can compute the optimal estimates 
for / and D using methods similar to the ones developed here. 

We will now present a summary of our main results and a list of some interesting 
open technical questions. 

1. Summary of our Main Results. 

1.1. Optimal Baycsian Estimators. 
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Several researchers have used Bayes theory and MRF models for the restoration 
of piecewise uniform images. It has been implicitly assumed by all of them that the 
maximization of the posterior probability is the best possible performance criterion. 
We have shown that it is possible to choose other criteria that are better adapted 
to each particular problem, and have derived the corresponding optimal estimators, 
which not only improve substantially the quality of the results (particularly for 
low signal to noise ratios), but also lead to more efficient and better behaved 
computational schemes. 

1.2. General Monte Carlo Algorithms. 

We have shown that the optimal Bayesian estimators can be obtained from 
the observation of the equilibrium behavior of a MRF (which in physical terms 
correspond to a ferromagnet subject to a spatially varying external magnetic field). 
This behavior can be effectively simulated by Monte Carlo procedures which 
generate a regular Markov chain with an invariant Gibbs measure. 

This method differs from "simulated annealing" (which has been used to 
approximate the MAP estimator) in that it is based on the collection of statistics of 
the evolution of the chain at a fixed temperature, while the latter attempts to find the 
ground state of the coupled system by slowly decreasing it. From a computational 
viewpoint, our method exhibits a faster and more consistent convergence behavior. 

1.3. Parallel Implementations. 

The implementation of this general Monte Carlo procedure in parallel hardware 
was discussed. We proved that the Gibbs sampler (but not the Metropolis or Heat 
Bath algorithms) will produce consistent results in this case. 

1.4. Reconstruction of Piecewise Constant Funcions. 

The problem of reconstructing a piecewise constant function from noisy 
(but dense) observations was formulated in probabilistic terms, and the form of 
the optimal estimators derived. For the one-dimensional case, we presented a 
deterministic algorithm with minimal complexity which computes (exactly) the 
MAP estimate of binary fields. For the two-dimensional case, we presented a 
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method for improving the computational efficiency of the "Simulated Annealing" 
scheme for approximating the MAP estimator, and derived a fast algorithm for 
approximating the optimal (MPM) Bayesian estimator. 

We also presented a maximum likelihood procedure, which based on an analysis 
of the residual ("innovations") process permits the simultaneous estimation of the 
field and the parameters of the system. We applied this technique to the construction 
of a parameter-free algorithm for the reconstruction of arbitrary binary patterns. 

1.5. Formation of Perceptual Clusters. 

We suggested that the process of formation of perceptual clusters of certain 
dot patterns can be modeled in terms of the estimation of binary images corrupted 
by multiplicative noise, and illustrated the application of our estimation algorithm 
to this task. 

1.6. Reconstruction of Piecewise Continuous Surfaces. 

The problem of simultaneously detecting the discontinuities and recontructing 
a piecewise smooth surface from sparse observations was cast in the Bayesian 
framework. A model consisting of two coupled MRF's: one representing the 
depth and the other the boundaries between continuous regions, was adapted to 
our problem. Since the straightforward use of the general Monte Carlo algorithm 
for finding the optimal estimate is computationally unfeasible in this case, an 
approximation (which showed an excellent experimental performance with both 
synthetic and "real" data) was derived and implemented. We also developed, and 
heuristically justified a fast algorithm that produces results that are practically 
indistinguishable from the optimal ones. The implementation of these procedures 
in digital parallel hardware, as well as in hybrid and analog networks was also 
discussed. 

1.7. Signal Matching. 

We presented a class of problems that is characterized by the fact that the 
conditional probability distribution of the observations P(g \ f) is multimodal (as a 
function of/), which means that the solution remains ambiguous, even for arbitratily 
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high signal to noise ratios. We studied a prototype problem of this class: the signal 
matching problem (in particular, the reconstruction of depth from stereoscopic pairs 
of images), and showed that it is possible, in principle, to find the solution using the 
general estimation procedures that we have developed (although the computational 
cost will be high in the general case). We also presented a different scheme, which 
is based on the direct implementation of the local constraints (generated by the 
probabilistic model) in a highly distributed cooperative network of a particular 
form: a "Winner-Take-All" network, and showed that the state of this network 
will converge to the correct solution in a few iterations (in the high SNR case). The 
application of this technique to the reconstruction of the depth of real objects from 
stereoscopic photographs was discussed, and some modifications to the algorithm 
were introduced, which permitted us to produce results which compare favourably 
with those of other "state of the art" algorithms. 

2. Open Technical Questions. 

2.1. Stochastic Models. 

We have shown throughout this work the richness and versatility of simple (first 
and second order) MRF models. It is clear,however, that there are classes of physical 
structures whose behavior cannot be adequately modeled by these processes (as a 
simple example, consider images formed by clusters of blobs of certain average size). 
There have been some attempts to model these and other "textured" patterns via a 
hierarchy of independent MRFs: one that represents the structure of the image, in 
terms of regions of uniform texture, and individual models for each textured regions. 
This representation, however, is not very convenient for estimation purposes. A 
more rigorous approach has been suggested by Grenander (1984), who has proposed 
the use of generalized Markovian fields to model complex patterns; these fields 
consist of several layers of "generators", which in the first layer correspond to 
grey levels, and in the succeeding ones, to features of increased complexity (lines, 
corners, etc.). It is not clear, however, how to use this approach to construct models 
of textured images; objects of different shapes, etc. 
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These considerations suggest the need for much more research in this area, 
which should include, perhaps, the use of probabilistic models that are not based 
on the Gibbs distribution. 

2.2. Multiple Scale Representations. 

It is the current view that the production of high-level (symbolic) descriptions 
of a scene should be mediated by the construction of numerical descriptions of 
the surfaces involved at different "scales". The parameters that describe a MRF 
play in some sense the role of scale parameters (see figure 1 of chapter 1; section 
5 of appendix 4.B and section 6 of chapter 5); this identification, however, is not 
completely satisfactory. A good multiscale representation should feature not only a 
progressive blurring of detail, but the aggregation of substructures into larger units 
in a way that is not accomplished by the current algorithms. 

2.3. Parameter Estimation. 

Intimately liked with the previous questions, is the determination of the optimal 
set of parameters of a given model from noisy samples. The maximum likelihood 
method that we have presented here (see chapter 5) becomes computationally 
unfeasible as the complexity of the model (the dimensionality of the parameter 
space) increases; therefore, alternative procedures need to be derived (for instance, 
the use of time-varying algorithms, such as the one presented in section 6 of chapter 
5 should be more rigorously investigated). 

A related (and more difficult) question is the selection of the optimal model 
from a certain class given only the noisy observations. It is possible that the ideas of 
Rissanen (1978, 1981, 1983) about "minimum description length" schemes, and also 
of Akaike (1977) about generalized maximum likelihood methods could be useful in 
this connection, although the high computational complexity of the present problem 
might limit the applicability of these techniques. 

2.4. Fast Algorithms. 

The practical use of the general Monte Carlo estimation algorithms of chapter 
3 is limited by the relatively large number of iterations needed for the convergence 
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of these systems. A very important question, then, is how to improve on the 
convergence time without sacrificing the power of these methods. The use of 
"multigricT type strategies (Brandt, 1973; Terzopoulos, 1984), which in the present 
case may take the form of "block-spin"" algorithms, such as the one presented in 
chapter 4 (see also White, 1983) should be investigated. 

Also in this connection, it should be interesting to find more rigorous justifications 
for the performance of the fast deterministic schemes that we have developed, based 
on heuristic considerations, in chapters 4 and 5, to see if it is possible to find some 
general principles that may guide the extension of these schemes to other, more 
general cases. 

2.5. Analog Computers. 

It would be interesting to actually construct prototypes of the hybrid and analog 
networks described in chapter 4 and 5, to assess the practicality and performance of 
such schemes. A more intriguing possibility is to exploit the isomorphism between 
the estimation process of a MRF from noisy data, and the equilibrium behavior 
of a ferromagnet with a coupled, spatially varying external field (see chapter 3), to 
construct very fast, special purpose "quantum" computers to perform the former 
task. 
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